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A CONTRIBUTION TO MANUAL APTITUDE 
| MEASUREMENT IN INDUSTRY 


THE VALUE OF CERTAIN DEXTERITY MEASURES 
FOR THE SELECTION OF WORKERS IN A 
WATCH FACTORY’ 


MILTON L. BLUM 
College of the City of New York and Vocational Service for Juniors 


HE problem of this investigation was to determine if 
fe measures of manual aptitude could be used as aids in 

the selection of female employees in a watch factory. 
The term, ‘‘manual aptitude,’’ is used here to designate spe- 
cific measures of muscular control, particularly in eye-hand 
coordination. 

The management of a certain watch factory realized that 
its labor turnover was largely due to the hiring of applicants 
who did not have the aptitudes necessary for miniature as- 
sembly. It became interested in the possibility of using selec- 
tion techniques that might supplement the interview and re- 
duce the labor turnover. This problem of the watch factory 
offered the author an opportunity to determine if manual 
aptitude measures could be an aid in a specific industrial 
selection. The New York State Employment Service cooper- 
ated in the enterprise and the testing program was conducted 
at the Queens branch of that Service. 

The following questions were raised in an attempt to solve 
this problem : 


1. What criteria of proficiency in watch assembly work can 
be established and what are their reliabilities? 

2. How do the criteria of proficiency compare with each 
other ? 

3. What tests can be used to measure the manual aptitude 
necessary for watch assembly work ? 

1 This study was sponsored by Vocational Service for Juniors. 
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4. What test indicators can be derived from the testing pro- 
gram to compare with the criteria of proficiency and what are 
their reliabilities? 

5. How do the test indicators compare with each other ? 

6. What prediction of the criteria of proficiency exists for 
the test indicators ? 

7. Assuming satisfactory prediction, what critical scores can 
be established as aids in the selection of successful watch fac- 
tory workers and what are their practical values? 

8. What does a follow-up show in relation to the results of 
this investigation ? 


HISTORY 


Published reports of the use of tests of manual aptitude in 
industry are few in number although one hears unofficially of 
occasional application of these tests as a selection technique. 
Because of this, an inclusive survey of manual aptitude tests 
cannot be made. It appears, however, that the validation of 
these tests has been performed usually upon workers either in 
training in industry or actually ‘‘on the job.”’ 

Burr (1) used a large battery of tests to select factory 
workers doing fancy feather pasting. Manual aptitude was 
measured by such tests as card sorting, motor control and 
feather sorting. She concluded that a positive relationship 
existed between the manual aptitude tests and a criterion of 
success, which was supervisor’s ratings, and she interpreted 
this as proof of the practical applicability of such tests in the 
selection of employees. Hayes (3) administered the Eastman 
Kodak test for assembling to a large number of girls employed 
at the Hawthorne Works of the Western Electric Company. 
The criterion of job success was an average of supervisor’s 
ratings and output on a scale with eight divisions. The corre- 
lation of criterion and test was + .31 + .06. 

Treat (9) administered the Scott Dexterity Test to a group 
of 101 potential garment operators and obtained a correlation 
of +.21 + .06 between the criterion of success and the test 
score. The proficiency criterion was an average of the actual 
grades given each day for work in machine operating. 
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Hines and O’Connor (4) reported one of the earliest studies 
of manual aptitude. In this investigation a finger dexterity 
test devised by O’Connor was administered to a large group 
of women applying for employment at the West Lynn Works 
of ‘the General Electric Company. The work required fine 
finger movements. They concluded that, 


“If only candidates performing the Test in 7.70 
minutes had been hired, none would have failed to 
make good and 85 percent of the desirable ones would 
have been hired. Nearly every employer can afford 
to turn away 15 percent of those who would make 
good, if by so doing he eliminates all failures.’’ 

O’Connor (7) contends that essential to manual aptitude is 
a particular degree of finger dexterity which is measured by 
such tests as these. It would seem that most repetitive tasks 
should be based upon manual aptitude as so defined. 

Otis (8) used the O’Connor tests in a battery administered 
to a group of girls in a vocational school. The proficiency 
criteria were ratings on a series of work samples produced by 
the subjects during the training course and the time taken to 
complete these tasks. Otis reported a correlation of —.17 + .13 
between speed of performance in the tasks and quality ratings 
on the work samples, thus indicating that the proficiency 
criteria were different measures of the work. Correlations of 
+ .20 + .13 for finger dexterity with the quality criterion and 
+.27 + .13 for the ‘finger dexterity with the speed criterion 
were reported. The tweezer dexterity test correlates + .07 
+ .13 with quality and + .46 + .10 with speed. 

A study by Candee and Blum (2), who administered the 
O’Connor Finger and T weezer tests to watch factory workers, 
is reported in some detail because it forms the basis for the 
present investigation. The subjects were selected by their 
foremen as either superior (20 workers) or mediocre (17 
workers). Total time taken to perform the test, ratings by 
the examiner on test performance, and improvement on the 
second half of the finger dexterity test were used as test indi- 
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eators. Speed of performance in the finger dexterity test 
tended to differentiate the superior and mediocre workers in 
the plant (D/o diff. of the averages was 2.18). T weezer dex- 
terity scores differentiated the plant workers from a general 
population of applicants for factory work better than did the 
finger dexterity scores, but they did not differentiate between 
mediocre and superior workers as well as the finger dexterity 
scores (D/o diff. between the averages for the superior and 
mediocre workers on the tweezer dexterity test was 1.01). The 
examiner’s ratings made on test performance and improvement 
on the second half of the finger dexterity test were slightly 
indicative of successful employees. Critical scores of 5’30” 
on the tweezer dexterity test and 7’30” on the finger dexterity 
test were established as an aid in selecting watch assemblers. 
Those scoring below were found to be less likely to be success- 
ful. The follow-up, reported as a part of this investigation 
makes use of the workers in this preliminary investigation. 

A review of the literature of manual aptitude tests which 
have been used in industry for the selection of employees shows 
that correlations of 0.20 and 0.30 are the more usual when 
performance on these tests and criteria derived from the job 
are compared. These low correlations suggest that manual 
aptitude tests must be carefully validated for a given job 
before they can be accepted as selection techniques. 


ANALYSIS OF WORK 


Before selecting the tests which would give the greatest 
promise of success, a study was made of the various types of 
work performed in the watch factory. The writer spent 
several days in the factory performing the various jobs of 
manufacturing, assembling, and inspecting the different watch 
parts. He interviewed both the foremen and employees and 
observed various employees at work. 

The work in the watch factory was done by girls sitting at a 
work bench. Each employee was responsible for one small 
specific task that was repetitive in nature. The several tasks 
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had in common the requirement of very close and sustained 
attention because of the fineness of the work. Most jobs re- 
quired the use of both tweezers and fingers at various times. 
The, movements involved were small finger movements that did 
not require strength but, on the other hand, demanded fine, 
steady and deft manipulations. The three most important 
items determined in the job analysis were as follows: 
1. Fine finger movements, 
2. Tweezers manipulations, and 
3. The ability to continue to perform these delicate tasks 
over long periods of time without increasing tension or 
maladjustment. 


CRITERIA OF PROFICIENCY 


The criteria of proficiency on the job used in the study were 
length of employment, sal=y ratio, and foremen’s ratings. 


Length of Employment 


A successful employee is likely to remain at work. An un- 
successful employee is either dismissed or else leaves because 
she is dissatisfied. In any event, it is reasonable to assume 
that length of employment is indicative of successful adjust- 
ment to the job. However, possible sources of error are recog- 
nized in this criterion of proficiency as the personal lives of the 
employees outside the plant were not known. 

Four categories of employees were made according to length 
of employment, based on the opinion of the factory manager 
concerning diagnostic periods in their training, as foilows: 


1. Those employed ‘‘less than one week,’’ which included a 
few who considered themselves too good for the job. The 
majority in this group left or were dismissed because they, or 
the employers, felt that they would never ‘‘catch-on.’’ Some 
stayed to the end of the week merely to obtain the pay envelope 
with the minimum amount of trouble and embarrassment. 

2. The next category was the ‘‘one week to four months’’ 
group. According to the factory manager, an employee could 
learn her task adequately by the end of a four months period 
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and those who left or were dismissed earlier usually could not 
be regarded as having the ability necessary to perform the 
assigned tasks. 

3. The third category consisted of employees who remained 
‘‘four months to one year.’’ This group had made a moder- 
ately successful adjustment to the plant according to the man- 
agement. 

4. The ‘‘one year or longer’’ group represented the workers 
whose training cost to the company was relatively the least. 
These workers were considered by the management as perma- 
nent employees. According to the factory manager, no 
further discrimination in ability would obtain: by further 
analysis of length of employment. It was his claim that these 
employees were functioning according to their congenial work- 
ing pace and best met production standards. 


The four categories shall hereafter be referred to as the 
criterion of length of employment. The number of employees 
in each category is given in Table I. 


TABLE I 
Number of Employees in Various Employment Categories 


























Employment Category Number of Employees 
Less than one week 50 
One week to four months 44 
Four months to one year 42 
More than one year 85 
Total 221 
Salary Ratio 


By salary ratio is meant the average weekly earnings of an 
employee for a three month period divided by the average for 
all employees and expressed in the form of an index with 
twenty dollars per week as equal to 100. A weekly average 
over a three month period was selected as most representative 
of a girl’s earnings. This period of time was considered long 
enough to smooth out any unusual fluctuations. The same 
three month period was chosen for all employees so as to 
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avoid any differential influence of seasonal demands although 
this procedure limited the number of employees included in 
the study. It was deemed advisable to sacrifice numbers in 
order to be sure that all salary ratios would represent produc- 
tion ability and not ‘‘peaks’’ or ‘‘slacks’’ in business as would 
have been the case if more than one three-months period were 
used. Most employees were on a piece rate system and their 
salaries were, therefore, a reflection of their production. 

All piece rate systems for the various jobs may not have 
been exactly equivalent which would have introduced an error 
in the salary ratio criterion. However, prior to the study, the 
management attempted in all ways to equalize the piece rate 
system for the various jobs and the criterion is probably as 
exact as obtains in any factory. 


Foremen’s Ratings 


The foremen’s ratings were the third criterion of proficiency 
used. Each foreman was asked to assign letter ratings to the 
workers in his department, judging whether they were A 
(excellent), B (good), C (average), D( poor), E (terrible). 
In rating the employees, the foremen were asked to consider 
general usefulness to the plant as well as individual efficiency. 
The foremen were not given any special training in rating as 
this might have caused antagonism to the investigation. They 
were told that the ratings were desired as a check upon an 
investigation then underway and assurance was given that 
their ratings would not in any way be regarded as a reflection 
of their individual efficiency. So that the foremen’s ratings 
might not be influenced by individual change in attitude or by 
discussion between the foremen, all ratings were requested at 
the same time. 


TESTS USED 


The job analysis indicated that the ability to make fine finger 
movements, the ability to handle tweezers, and the ability to 
continue to perform these delicate tasks without increasing 
tension or maladjustment, were the important qualifications 
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for watch assembly work. Among the available tests of 
manual aptitude it seemed that the O’Connor Finger and 
T weezer Dexterity Tests probably would measure these qualifi- 
cations with the greatest degree of success. These tests re- 
quire that the subject work with objects of small dimensions 
and measure manual operations that are analogous to the job 
performance of watch assembling. 

The tests have the advantages that they are easy to adminis- 
ter and are inexpensive. The length of time to perform both 
tests is relatively short (close to twenty minutes) and the 
directions are easily understood. It was found possible to 
give the tests conveniently to more than one person at a time 
and score them accurately. The finger and tweezer test boards 
and pins which were used conformed to all the requirements 
specified by O’Connor (7). All testing was administered by 
the author. 


TEST INDICATORS 
The term, test indicator, is used to refer to a score or esti- 
mate derived from performance on the test. The test indica- 
tors used in the study were time score, improvement as repre- 
sented in decreased time on the second half of the test, and 
quality ratings on the performance. 


The Time Score 


The time to perform each test was taken by a stop watch 
and the score was recorded to the nearest second. 


Improvement 
The decrease (or increase) in the time required to complete 
the second half of the finger dexterity test as compared to the 
first half was adopted as the second test indicator. Whereas 
the time score is a measure of speed, this score is indicative of 
the ability to improve in rate of speed. 


Quality Ratings 
In addition to these objective scores, quality of performance 
of each subject on both the finger and tweezer tests was esti- 
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mated by the examiner. In rating a subject’s performance, 
the examiner considered the manner in which the directions 
were followed, the tension of the subject, and the technique 
each subject used to perform the task. The following rating 
scale was used in preparing the quality ratings: 





GOOD | AVERAGE; POOR 





Accuracy of selection. 

Grasp of pins. 

Positioning of pins. 

Placing of pins. 

Hand tremor. 

Condition of board. 

Pace. 

Position and movements of arm. 
Body posture. 














The above scale served as orientation for the examiner. 
After completing these ratings upon test performance, the 
examiner made a separate composite rating according to the 
following seale: 1 (excellent), 2 (good), 3 (average), 4 (poor), 
5 (unsatisfactory). In addition to this five point rating 
scheme, a plus or minus might have been assigned to any rating 
according as the performance was judged slightly better or 
worse. The composite rating was used as a test indicator of 
the investigation. 

The distributions of the quality ratings on the finger dex- 
terity test (163 subjects) and the tweezer dexterity test (143 
subjects) are presented in the first and third columns of Table 
II. The tweezer test ratings are higher than the finger test 
ratings. The reason for this is that the possible range of 
variability is less in performance on the tweezer dexterity test. 
The subject must pick up the pins in a certain manner or else 
the pins cannot be picked up at all. 

Both of these distributions of ratings are skewed toward the 
superior end. This is because the rating system was originally 
developed upon a highly heterogeneous group consisting of 
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several hundred female juniors as an additional aid to the 
finger and tweezer test scores in their vocational guidance. A 
comparison is made of the watch factory groups and a sample 
of 420 applicants for factory employment at the Queens office 
of the New York State Employment Service for which the same 
ratings were secured and which were made largely by the 
author. The distributions of ratings for this group are shown 
in columns 2 and 4 of Table II. The watch factory employees 
are superior in quality ratings to the factory applicants. The 
2.4 per cent of the watch factory employees were rated below 
average on finger performance and the D/o diff. between this 
percentage and that of the general factory group receiving 
below average ratings was 2.4. Also, on tweezer performance 
a smaller percentage of watch factory employees than of the 
general factory applicants were rated below average and the 
D/o diff. was 2.4. 


TABLE II 


Distribution of Quality Ratings on Finger and Tweezer 
Dexterity Test 








Finger Dexterity Tweezer Dexterity 
; Test Test 
Quality , : 
Ratings Employees Applicants Employees Applicants 
in Watch or Fac- in Watch or Fac- 


Factory tory Jobs Factory tory Jobs 





Total number of 


individuals .............. 163 420 143 420 

1—Excellent 2.0... 5.0% 3.6% 13.3% 5.9% 
Seed a6 44.1% 39.2% 55.9% 50.1% 
B—AVETAZE ccc 48.5% 50.9% 28.0% 36.3% 
ye SS ee 24% 5.8% 2.8% 7.1% 
5—Unsatisfactory ..... 0% 2% 0% 5% 





SUBJECTS AND CONDITIONS OF TESTING 


The subjects used in the study were applicants at the New 
York State Employment Service, Women’s Industrial Divi- 
sion, for factory employment. During their interviews, which 
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preceded any testing, they were informed by the employment 
interviewer that they were to be tested. This was accom- 
plished in an informal manner so as to avoid the fear of an 
expected examination. After the applicants had been tested 
they went back to the interviewer who made the referrals to 
the watch factory or to other employment according to estab- 
lished interviewing criteria and without any exact knowledge 
of the test results. According to the interviewer’s statement, 
she referred to the watch factory only those people whom she 
‘‘felt’’ would be successful in the work of watch factory 
assembling. Those accepted for employment in the watch 
factory formed the tested group of this investigation. 

A total of 258 subjects composed the sample for investi- 
gation. Of this, 137 constituted the tested group mentioned 
above and is used in the analysis of relationships of test indica- 
tors and criteria of proficiency. A control group was formed 
of 84 subjects who were employed by the watch factory upon 
referral by the interviewer without being tested and is later 
used for comparison in establishing critical scores. The 37 
subjects of the preliminary study (2) formed the ‘‘follow-up”’ 
group. 

Most of the subjects composing the sample had had indus- 
trial experience, but none, according to their own statement of 
employment history, had had previous experience in watch 
assembly, work. None were eliminated as applicants for em- 
ployment in the watch factory on grounds of either nationality 
or religion. All were white. Over ninety per cent were 
between the ages of 20 and 25 years. Their average age was 
22 years ; the age range was between 18 and 40. 

The tests were administered to three girls at a time. They 
were seated behind each other so that any one could not observe 
the other’s progress. The room in which the girls were tested 
was specially designated as a testing room and was free from 
all disturbances. Lighting, temperature, and atmospheric 
conditions were favorable at all times. The girls knew that 
they were going to take a test for a position in a watch factory. 
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In fact, some were extremely tense and ‘‘nervous’’ upon enter- 
ing the testing room and it was occasionally necessary to spend 
a certain amount of time in the development of rapport. For 
this reason, it was deemed advisable to modify somewhat the 
instructions that O’Connor (7) uses. 

The management of the factory was given no test results. 
At no time did it have any information about the functioning 
of the girls on the tests. This precaution was necessary so that 
any knowledge of test results could not influence the attitude 
of employers and foremen in rating the subjects. 


RESULTS 
CRITERIA OF PROFICIENCY 


Both salary ratio and length of employment are objectively 
determined measures, and the reliability of each should be indi- 
cated by a correlation coefficient of 1.00. 

Foremen’s ratings are estimates which are known to be 
variable upon repetition. To establish the reliability of the 
ratings used in this study, each foreman was asked to re-rate 
his employees more than one year after the original ratings. 
Forty-nine subjects were re-rated. The correlation coefficient 
of the first and second ratings was +.60 + .09.2_ This is re- 
garded as satisfactory reliability, as ratings at widely sepa- 
rated times may indicate real changes in behavior. 

This correlation coefficient compares favorably with previous 
work of a similar nature. Kornhauser (5) obtained an aver- 
age correlation of + .60 when various traits were re-rated by 
the same raters. MacQuarrie (6) reported a reliability coeffi- 
cient of + .40 when repeated ratings by teachers of mechanical 
ability were compared. 

It is assumed that each of the proficiency criteria measures 
an aspect of success in the job of watch assembling. The inter- 
correlations are presented in Table III to indicate the degree 
of overlapping of criteria. 

2 The o rather than the P.E. is reported for all correlations. All corre- 


lations reported were computed by the product moment method unless 
otherwise indicated. 














MEASUREMENT IN INDUSTRY 393 


TABLE III 


Correlation of Criteria’ 











Coefficient of 
Correlation Ge 

Criteria Compared ......c.ccccccccee 
Salary Ratio and Length of 

Employment ....cccccerssecseresensersones +44 + .07 
Foremen’s Ratings and Salary 

Ratio +.13 +.10 
Length of Employment and 

Foremen’s Ratings .................... + .25 + .08 





These intercorrelations are fairly low and are interpreted to 
mean that the three criteria measure relatively different aspects 
of the same thing, which we have called success on the job. As 
this is what is wanted in order to have distinctive criteria of 
proficiency, it may be said that this part of our investigation 
has been satisfactorily terminated. 


TEST INDICATORS 


In order to establish the reliability of the test indicators, an 
unselected group of sixty-four subjects composing the sample 
of applicants for positions at the watch factory was re-tested, 
following an interval of one-half hour. The test-re-test re- 
liability coefficient for the time score on the finger dexterity 
test was + .89 + .03* which is satisfactory reliability according 
to accepted standards. 

Table IV presents the intercorrelations of the various halves 
of the finger test on the re-test group. 

The data indicate a high degree of relationship existing in 
length of time to perform the task among the various halves of 

8 The partial correlations were aiso computed. They are: salary ratio 
and length of employment (with foremen’s rating out) +.43; foremen’s 
ratings and salary ratio (with length of employment out) +.00; length 
of employment and foremen’s ratings (with salary ratio out) + .22. 

4 Data were not available to compute the reliability of test scores of the 


Tweezer test, but their reliability is interpreted to be approximately the 
same as for the finger test. 
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TABLE IV 
Intercorrelations of Half Times on Finger Dexterity 
Test 
N = 64 
Original Original 
Test— Test— —— 
First sd Second % Half Ge 
Half Half 
Original Test— 
Second Half ..... + .84 03 
Retest—First 
} + .82 .04 + .75 05 
Retest—Second 
get AE +.77 05 + 87 .03 + 82 .04 





the test and re-test. The average time to complete the halves 
of the test and re-test decreases upon repetition. The average 
time for the first half of the original test is 3’45”. This time 
for the second half of the re-test is 3’25”. The correlation 
between the times on these two halves is +.77. It is indicated 
that half-time to perform the test is a fairly reliable measure. 

The reliability coefficient for the quality ratings on the finger 
test is + .89 + .03° and satisfactory. It is interesting to note 
that the reliabilities of time score and quality ratings are prac- 
tically equal despite the fact that the former is an objective 
measure and the latter consists of estimates. 

The reliability coefficient for the improvement was +.13 
+ .12, which was obtained by computing the correlation 
between the amount of improvement in the original test (dif- 
ference in time between second half and first half of test) and 
in the re-test. Improvement, as thus defined, is an unreliable 
test indicator. 

Amount of improvement should be related to actual speed 
of work. The nearer one approaches one’s physiological limit, 
the less opportunity there is for improvement. There would 

5 Data were not available for the tweezer test and the reliability of its 


quality ratings is estimated to be slightly lower than for the finger test 
because of the more restricted range of ratings. 
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consequently seem to be some justification for investigating 
relative improvement. The correlation between the relative 
improvement in the original test (improvement/time on first 
half) and the relative improvement in the re-test (improve- 
ment in re-test/time on first half of re-test) is +.26 + .11. 
Improvement as thus computed is also an unreliable test indi- 
cator. Relative improvement on the original test (improve- 
ment/time on first half) and the relative improvement on the 
re-test (improvement on total retest/total time on original 
test) correlated + .24 + .11, which again indicates unreliability 
of the improvement measure. 

The first two test indicators, time scores and quality ratings, 
can be considered as measures yielding a high degree of re- 
liability. It may be concluded that neither absolute improve- 
ment or relative improvement is a reliable test indicator. The 
determination of validity is an important matter but may only 
follow if reliability is present. The validity of time score and 
quality rating will be presented later when test indicators and 
criteria are compared. 


Inter-Relation of Test Indicators 


A comparison of the test indicators was made. The various 
intereorrelations, based upon 119 subjects, are presented in 
Table V. It can be seen that all correlations, with the excep- 
tions of that of + .260 between quality rating on the finger dex- 
terity test and quality rating on the tweezer dexterity test and 
the correlation of + .711 between tweezer dexterity time score 
and quality rating on the tweezer test, are unreliable as their 
sigmas are more than one third of the correlations. 

With the one exception, that of quality rating on the tweezer 
dexterity test and speed on the same test, all correlations 
satisfy the accepted requirement for sub-tests of a test battery. 
These low intercorrelations indicate that the different test indi- 
cators, either measure different things or different aspects of 
the same thing. 
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TABLE V 


Intercorrelations and Sigmas of Test Indicators Based 
Upon 119 Subjects 








a : > 
p é : 
E £ m #3 
= A | ¥ 
KE oe a's © 
2 A . 2 jes] . Pt Qa . 
ag é& ei } . i. = * 8 
5° : & 6 8 2 bo 
#8 SF 5 3 ff: 
am ae 4 
Tweezer 
Dexterity 
Time Score..... +.194 .09 
Improvement 
Second Half 
WU. discratioen +.141 09 -.026 .09 
Quality 
Rating 
on F.D, ....... +.136 .09 +.140 .09 +.115 .09 
Quality 
Rating 
SF TD. sicsies +.045 .09 +.711 04 +.137 .09 +.260 .08 





TEST INDICATORS AND CRITERIA 


Quality Ratings on Tests and Criteria 


Table VI distributes the quality ratings on both the finger 
and tweezer dexterity tests according to the proficiency cri- 
terion of length of employment. A comparison of finger dex- 
terity quality ratings for superior and inferior groups yields 
ne reliable results. But there is a statistically reliable differ- 
ence between the ‘‘above average’’ ratings and the ‘‘average 
and below’’ ratings on the tweezer dexterity test for those who 
remained for a period of four months or longer. Whereas 64 
per cent of those who received ‘‘above average’’ ratings remain 
in employment four months or longer, only 39 per cent of those 
who receive ‘‘average and below”’ ratings remain in employ- 
ment for that length of time. The D/o diff. is 3.6. <A similar 
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TABLE VI 


Percentages of Quality Ratings Obtained with Respect 
to Length of Employment 








s' Less 7Days 4Mos. More 
ety N 7 to to than Total 
gs Days 4Mos. 1Yr. 1 Yr. 
F.D. Above 
Average ...... 106 19% 25% 12% 44% 100% 
Average and 
Below ............ 47 10% 30% 15% 45% 100% 
T.D. Above 


Average ..... 108 15% 21% 17% 47% 100% 
Average and 
Below ........... 31 26% 35% 7% 32% 100% 





comparison for a combination of the finger dexterity and 
tweezer dexterity quality ratings does not yield any reliable 
results. Whereas 64 per cent of those who receive ‘‘above aver- 
age’’ ratings remain in employment four months or longer and 
49 per cent of those who receive ‘‘average or below’’ ratings 
remain in employment for that length of time, the D/o diff. is 
only 1,7. 

A low relationship exists between the quality ratings on 
either test and salary ratio. The correlation between finger 
quality ratings and salary ratio is +.17 + .11. The correla- 
tion between tweezer quality ratings and salary ratio is 
+.15+ .11. The correlation between combined quality rat- 
ings and salary ratio is + .05 + .12. 

The data in Table VII enable one to compare the quality 
ratings on test performance according to salary ratio; the 
table shows the percentage of superior and inferior ratings in 
salary groups. No statistically reliable differences exist when 
the various percentages of superior and inferior ratings are 
compared. 

A rather high relationship exists between foremen’s ratings 
and quality ratings by the examiner on finger dexterity test 
performance. The coefficient of contingency is +.50. This 
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TABLE VII 


Percentages of Subjects Receiving ‘‘ Above Average’’ and ‘‘ Average and 
Below’’ Quality Ratings in Various Salary Ratio Groups 


Quality Salary Ratio 











: Total 
Rating N 45-64 65-84 85-104 105-124 
Finger Above 
Dexterity Average ......... 49 16% 20% 55% 9% 100% 
Average and 
Below ........... 26 15% 27% 54% 4% 100% 
T weezer Above 
Dexterity Average ........ 58 13% 26% 53% 8% 100% 
Average and 
Below ............... 13 31% 8% 61% 0% 100% 





coefficient was computed as a result of a four by four classifi- 
cation (excellent, good, average, poor). The maximum C for 
such a table is + .86. 

The coefficient of contingency for the tweezer test quality 
rating and the foremen’s rating is + .24. Quality ratings on 
the tweezer dexterity test were generally higher and more 
homogeneous than the finger dexterity quality ratings, as 
already indicated, and homogeneity always reduces the cor- 
relation. A lower correlation here may be a refiection of 
homogeneity of ratings. When the quality ratings for both 
tests were combined and correlated with foremen’s ratings, 
the coefficient of contingency is + .30. 


Improvement and Criteria 
Improvement on the second half of the finger dexterity® 
test was found to be an unreliable test indicator. However, 
for the sake of uniformity in reporting results, a comparison 
of this test indicator is made with each criterion of proficiency. 
The indication from Table VIII is that improvement does 
not discriminate the subjects in length of employment. The 


6 The standard directions for administering the Tweezer Dexterity make 
comparable measures for it unavailable. 
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larger standard deviations for the two longer employment 
periods are due to a spreading of the range in the direction of 
greater improvement. The differences between the averages 
for the various periods of employment are all unreliable. The 
largest difference occurred between the ‘‘less than 7 day’’ 
group and the ‘‘4 months to 1 year’’ group where the D/o diff. 
was 1, 
TABLE VIII 


Average Improvement on Second Half of Finger Dexterity Test 
in Various Length of Employment Periods 











Less 7 Days 4 Mos. More 
7 to 4 tol than 
Days Mos. Year 1 Year 
Number 24 29 19 59 
Average Improvement ..... 5.7 6.5 11.3 6.9 
Standard Deviation ........ 14.8 14.4 18.2 18.7 
Ss Niiineikeidlbniain 3 2.7 4.5 3.4 





A comparison of the amount of improvement on the second 
half of the finger dexterity test and salary ratio yields no sig- 
nificant results. The correlation of this test indicator and the 
criterion is —.06 + .13. Any analysis of the distribution with 
a correlation as low as this is futile. 

The data in Table IX show a trend for improvement on the 
second half of the finger dexterity test to be related to higher 
foremen’s ratings. Only sixty-eight subjects were available 


TABLE Ix 


The Percentage of Subjects Receiving Various Foremen’s Ratings 
Improving on Second Half of Finger Dexterity 








Foremen’s N % of Group Improving on 
Ratings Second Half of F.D. 
TE phecvinssntlitiititini ie 3 100 
i atsirissiatccnmainiies 17 82 
| Re Oe = 42 64 


Bt aiaagaidiiecsic 6 50 
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for this comparison and the few subjects in the extremes of the 
distribution discounts any implications of this trend. How- 
ever, the D/o diff. for the A ratings and D ratings is 2.5, when 
a comparison is made of the percentages in the two groups 
that show an improvement on the finger dexterity. The D/o 
diff. of the combined A and B ratings as compared with the 
C and D ratings is 2, when a similar comparison is made. 


Time Score and Criteria 


The last test indicator, time taken to perform the tests, will 
be compared with each proficiency criterion separately and 
combined for the two tests. A comparison of the average time 
score on each of the tests for the various employment categories 
is made in Table X. It is shown that the average time scores 
on the finger and tweezer dexterity tests improve with length 
of employment. The reduction in the standard deviation 
indicates that the workers become more homogeneous in test 
performance. The coefficient of variation decreases on the 


TABLE X 


The Time Score on the Finger and Tweezer Dexterity Tests 
for Various Employment Categories 





Less than 7 Days to 4Mos.to More than 





7 Days 4 Months 1 Year 1 Year 
F.D. DP Nsctedininias 25 44 22 70 
Average .......... . 457” 432” 432” 417” 
Ds iiancs: 42” 43” 35” 
PRED mee ed 8.2” 6.0” 9.3” 4.2” 
T.D. I cieiseiiietemaseuiaigh 21 38 21 65 
Average .......... 347” 345” 307” 317” 
eS a er 55” 45” 43” 
SRSA 10.9” 9.0” 10.0” 5.4” 
F.D. BF pith 21 38 21 65 
and 
T.D. Average ........... 778” 768” 739” 731” 
(combined ) 
_§ pean < e 62” 69” 65” 


Gh “tls 16.9” 10.1” 15.2” 8.1” 
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finger dexterity test from 9 for the ‘‘less than 7 day’’ group 
to 8 for the ‘‘more than 1 year’’ group. The V for these same 
groups decreases from 14 to 13 on the tweezer test. 

There is also a decrease in V when the score is the combined 
time on the finger and tweezer dexterity tests; in this instance 
the change is from 9 to 8. The decreases in the time scores are 
all the more impressive when it is recalled that the subjects are 
of a factory population that is considerably skewed toward the 
faster scores. The averages for the ‘‘less than 7 days’’ group 
in Table X are equal to the 74th percentile on the finger dex- 
terity test and the 57th percentile on the tweezer dexterity test 
for girls between the ages of 16 and 24 seeking guidance in 
New York City.’ The averages for the ‘‘more than 1 year’’ 
group are equal to the 91st percentile on the finger dexterity 
test and the 77th percentile on the tweezer dexterity test. 

Critical ratios of the differences in time scores between the 
‘*less than 7 day’’ group and the other employment categories 
are reported in Table XI. A real difference exists between 


TABLE XI 


D/o diff. of Average Test Time Between the ‘‘less than Seven Day’’ 
Group and the Other Employment Categories 








7 Days to 4 Mos. to More than 
4 Mos. 1 Year 1 Year 
Less than 
7 days I saicesitiivatilon 2.5 2.1 4.3 
AI osenuedaoitnns 2 2.7 2.5 
F.D. 
and 
I siscsseicibinichadsts 5 7 2.3 





the average time for the ‘‘less than 7 day’’ group and the 
‘‘more than 1 year’’ group in the finger dexterity test and 
other differences for this test closely approach statistical re- 


7 Norms in use at the Junior Consultation Service, 87 Madison Avenue, 
New York City. 
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liability, as do those for the tweezer dexterity test. It is inter- 
esting that a combination of the finger and tweezer dexterity 
time scores does not increase the statistical reliability of the 
differences between the categories of employment; but the 
same trend for faster scores to accompany longer periods of 
employment is present. Low intercorrelation between the 
finger and tweezer dexterity tests and the fact that the time 
scores are not comparable, one distribution being higher than 
the other, undoubtedly explain the lack of increased reliability. 

A comparison of time score on the finger and tweezer dex- 
terity tests with salary ratio yielded low positive correlations. 
There is a slight tendency for faster times on the test to go with 
higher salary ratios. But the correlations obtained are too 
low for the purpose of prediction in individual cases. The 
correlation between finger dexterity time score and salary ratio 
is + .26 + .10; between tweezer dexterity time score and salary 
ratio it is + .32 + .10; when the finger and tweezer dexterity 
test scores are combined, the correlation between time scores 
and this criterion is + .39 + .09. 

A positive but statistically unreliable trend exists in the com- 
parison of time scores and foremen’s ratings. Those receiving 
ratings that were above the average of the group had an aver- 
age time score on the finger dexterity of 652”. This was five 
seconds faster than the average for those with ratings of ‘‘C’’ 
and ‘‘D,’’ or average ratings and ratings below average, but 
this difference was not statistically reliable. A similar situa- 
tion existed for the tweezer dexterity test. The average score 
for the group receiving ratings above the average of the group 
was 5°14”. This was nine seconds faster than for the group 
of workers receiving ‘‘C’’ and ‘‘D’’ ratings but this difference 
was not statistically reliable. For the combined test scores, 
the average score of those receiving superior foremen’s ratings 
was 12°07”, which was 12” faster than that of the inferior 
group, but this difference was not statistically reliable. All 
critical ratios of differences in time scores for groups formed 
by foremen’s ratings in all test comparisons were less than 1. 
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CRITICAL SCORES 


The preliminary study (2) suggested the possibility of using 
time scores in the tweezer dexterity test and the finger dexter- 
ity test as ‘‘critical scores’’ below which workers should not be 
employed. The results in this investigation have indicated 
that the time scores were related in varying degrees to the 
proficiency criteria of length of employment, salary ratio, and 
foremen’s ratings. They were shown to be particularly valu- 
able in the prediction of length of employment with its impor- 
tant relation to the cost of labor turnover. 

The ‘‘critical scores’’ of 5’30” for the tweezer dexterity test 
and 7°30” for the finger dexterity test or faster, which were 
suggested in the preliminary investigation, will be applied to 
the subjects of this investigation to see what differentiation 
of employees will exist according to the proficiency criteria. 
If they separate the poor from the good workers, then, not 
only is a validation of the critical scores obtained, but a prac- 
tical recommendation for selection of watch factory employees 
can be made. 

Two hundred twenty-one of the subjects of our sample are 
classified in 3 selection groups as shown in’ Table XII. The 
tested group, which has permitted the analysis of the data up 
to this point, is separated into two groups according as the 
subjects ‘‘passed’’ or ‘‘failed’’ the tests. In addition, the 
**no test’’ group, mentioned earlier, is included, so that selec- 
tion with and without the use of tests can be evaluated. 


TABLE XII 
Number of Subjects in Various Sample Groups Studied 








N Groups 
66 **passed both tests’’ 
71 **failed either or both tests’’ 
84 **no tests’’ 
221 Subjects of investigation exclusive 


of follow-up group 
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If the critical scores are to be valuable, it would be expected 
that the group that ‘‘ passed both tests’’ should compare most 
favorably with the criteria. The ‘‘no test’’ group should be 
next in rank. Finally, should come the group that ‘‘failed 
either or both tests.’’ As selection on the basis of the inter- 
view was the same for all groups, it can be expected that the 
**failed either or both tests’’ group should be inferior on at 
least one test performance. 

Comparison of the critical scores according to selection cate- 
gories for the three groups in Table XII is made in Table XIII. 


TABLE XIII 


The Percentages of the ‘‘ Passed Both Tests’’ Group, the ‘‘No Test’’ 
Group and the ‘‘ Failed Either or Both Test’’ Group in the 
Various Employment Categories 





I One Four 





Groups No. than Week Mos. ao Total 
1 week to 4 tol 1 Year 
Mos. Year 

** Passed Both 

Tests’’ Group.. 78 7% 21% 15% 57% 100% 
**No Test’’ 

ee 84 23% 27% 9% 41% 100% 
**Failed Either 

or Both Tests’’ 

6... 74 24% 36% 12% 28% 100% 





From these data it can be seen that the ‘‘passed both tests’’ 
group makes a superior vocational adjustment according to the 
criterion of length of employment. This group contains the 
smallest percentage of those employed ‘‘less than one week’’ 
and the greatest percentage of those employed ‘‘more than 
one year.’’ A statistically significant difference exists be- 
tween this latter group and the other two in the percentage 
who leave within one week (critical ratios of 3.2 and 3.4). The 
smallest percentage of subjects who are no longer employed 
at the end of four months is found in the ‘‘ passed both tests’’ 
group. This percentage differs with statistical significance 
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from both the ‘‘failed either or both tests’’ group and the ‘‘no 
test’’ group, with critical ratios of 4 and 3.1, respectively. 
From these comparisons it can be seen that hiring on the basis 
of the critical scores will decrease the probability of engaging 
people who will be unsuccessful according to the criterion of 
length of employment or will increase the likelihood of hiring 
successful employees. 

The number of subjects for the comparison of critical scores 
and salary ratio is smaller than in the preceding comparison. 
In order to avoid fluctuations in business conditions, a specified 
three month period of time for computing the salary ratio was 
used and only those subjects employed during that time were 
included. As an additional precaution, to avoid any influence 
in the comparison of subjects who were still learning the fun- 
damentals of their respective positions, no employee was 
included unless she had worked the previous three month 
period. 

The data comparing salary ratios of the ‘‘ passed both tests’’ 
group, the ‘‘no-test’’ group and the ‘‘failed either or both 
tests’’ group are presented in Table XIV. Those who ‘‘ passed 
both tests’’ earned the most. The ‘‘no test’’ group was second, 
and the ‘‘failed either or both tests’’ group earned the least. 


TABLE XIV 


The Average Salary Ratios Obtained by the ‘‘ Passed Both Tests,’’ 
**No Tests’’ and ‘‘ Failed Either or Both Tests’’ Groups 











Groups N Average 8.D. Om 
** Passed Both Tests’’ 

Group 46 91.1 15.05 2.22 
**No Test’’ Group ............. 38 87.6 18.60 3.02 
‘Failed Either or Both 

Tests’? Group -..cccccccn ee 73 15.95 2.82 





The average salary ratio of the ‘‘passed both tests’’ group 
differs with statistical reliability from the average salary ratio 
of the ‘‘failed one or both tests’’ group. The D/o diff. =5.0. 
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The ‘‘failed one or both tests’’ group is inferior to the ‘‘no 
test’’ group. The difference existing is reliable with a D/o 
diff. of 3.5. Although the ‘‘passed both tests’’ group is su- 
perior to the ‘‘no test’’ group, the difference is not reliable 
statistically and the D/o diff. is .9. 


TABLE XV 


Percentages of ‘‘No Test,’’ ‘‘ Passed Both Tests’’ and ‘‘ Failed Hither 
or Both Tests’’ Groups Receiving Foremen’s Ratings of 
** Above Average’’ and ‘‘ Average and Below’’ 








G Ra G oh 

roup Rating Group Rating 

Groups N aheve Average Total 
Average or Below 

** Passed Both Tests’’ 

SERCO 53 34% 66% 100% 
**No Test’’ Group. ......... 34 35% 65% 100% 
‘*Pailed Either or Both 

Tests’’ Group ............... 28 25% 75% 100% 





The percentages of each of the three groups receiving fore- 
men’s ratings above and below average on employee efficiency 
by foremen are presented in Table XV. In order to maintain 
a consistent point of view in the rating, the foremen were re- 
quested to make all ratings at one time. While this meant the 
sacrificing of subjects from comparisons, it eliminated a vari- 
able error in rating. Both the ‘‘no test’’ group and the 
‘passed both tests’’ group contain about equal percentages of 
superior ratings by the foremen. They contain more than the 
group that ‘‘failed either or both tests’’ but the differences are 
not statistically reliable since each D/o diff. is .9. 

An additional problem remains to be investigated, which is a 
comparison of the value of the critical scores on the two tests 
alone or in combination. Table XVI presents the data on 
lengths of employment for a group that passed only the finger 
dexterity test and a group that passed only the tweezer dex- 
terity test and a group that passed both tests. These data are 
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equivocal. If employment ‘‘more than one year’’ is the cri- 
terion, then the finger dexterity test is a more efficient selector 
than the tweezer test. If employment ‘‘four months to one 
year’’ is the criterion then the tweezer test is better. But a 
comparison of the ‘‘passed both tests’’ group with the groups 
that passed only the finger or tweezer dexterity tests suggests 
that it is best to use a combination of both tests rather than 
the result of either test alone in predicting longer periods of 
employment. The percentage of the ‘‘passed both tests’’ 
group remaining employed ‘‘more than 1 year’’ is larger than 
either the group passing only the finger dexterity test or the 
group passing only the tweezer dexterity test for a similar 
employment period. The critical ratios between the groups 
are 2.2 and 3.3, respectively. 


TABLE XVI 


Percentages of Groups Passing Either of the Dexterity Tests 
or Both Tests in Various Employment Categories 





Per Cent of Group Remaining in Employment 





One Four 
— Week Months — 
Groups N One to to ' Gas Total 
Week Four Gne Teor 
Months Year 





Passed Both Tests ..... 78 7 21 15 57 100% 
Passed only Finger 

Dexterity ............. 48 20 37 6 37 100% 
Passed only Tweeze 

Ge ee 14 21 21 27 21 100% 





Although the results are favorable when the critical scores 
(5’30” on tweezer dexterity test and 730” on finger dexterity 
test) suggested by the preliminary study (2) are used, it was 
deemed advisable to determine what would happen to the 
selective value of the tests if other critical scores were used. 
When critical scores of 715” on the finger dexterity test and 
5’15” on the tweezer dexterity test were used, thereby reducing 
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the test times by 15”, those who ‘‘ passed both tests’’ were re- 
duced from 78 to 55. But the percentage who remained in 
employment over a four month period was the same, or 72 
per cent. The average salary ratio of this smaller group also 
is similar to the larger group of 78. Thus, no advantage is 
gained by lowering the times of the critical scores in deter- 
mining length of employment or salary ratio. It only serves 
to make selection of employees more difficult. 

When the critical scores are made as lenient as 745” on the 
finger dexterity test and 5’45” on the tweezer dexterity test, 
18 additional subjects are added to the original 78. Under 
these conditions, only 66 per cent remain in employment longer 
than 4 months whereas 72 per cent did so when selected with 
critical scores of 7’30” and 5’30”. The average salary ratio of 
the group added is 69. This compares unfavorably with the 
91 salary ratio of the original group. Whereas 10 per cent of 
those added to the ‘‘ passed both tests’’ group receive foremen’s 
ratings of ‘‘above average,’’ 34 per cent of the original group 
received such ratings. 

Lowering and raising the critical scores by 15 seconds on 
each test does not increase the efficiency of selection. It is, 
therefore, deemed advisable to recommend 5’30” on the tweezer 
dexterity test and 7’30” on the finger dexterity test as the best 
critical scores. 


‘*POLLOW-UP’’ OF THE PRELIMINARY STUDY 


The preliminary study (2) recommended the use of the 
finger and tweezer dexterity tests to aid in the selection of 
watch factory employees. This conclusion was regarded as 
tentative because of the limited number of subjects and be- 
eause the only proficiency criterion available was foremen’s 
ratings. A follow-up of the subjects of this study offered the 
opportunity to investigate the number of the employees who 
were no longer employed and the salary ratios of all while 
employed according to their selection by foremen as superior 
or mediocre employees. In the preliminary study it was re- 
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ported that these two groups of employees were distinguished 
by a difference in test scores with a critical ratio of 2.18 for 
the finger dexterity test and a critical ratio of 1.01 for the 
tweezer test. 

Table XVII shows the percentages no longer employed in 
the two groups after an expiration of two years along with the 
average time scores on the tests. 


TABLE XVII 


Average Time Scores and Percentage of Workers Unemployed in Groups 
Selected by Foremen as Superior and Mediocre Employees 


(From Preliminary Study) 








a ot bisme es Percentage 
N . o o of Group 
Finger Tweezer Disshasaed 
Dexterity Dexterity 8 
Superior 
Workers ...... 20 6/55” 34” 4/50” 37” 0 
Mediocre 
Workers ..... 17 7°32” 57” 504” 43” 18 





The difference over the sigma of the difference between groups 
in the per cent discharged is 2. 

Table XVIII shows the average salary ratios of the two 
groups selected by foremen as superior and mediocre. 


TABLE XVIIIs 
The Average Salary Ratios for Superior and Mediocre Workers 








Average 
Group No. Salary on mete Om 
Ratio 
Superior Workers ........... 15 110 14 3.6 
Mediocre Workers .......... 15 93 10.5 2.7 





8 The number of workers in this table is reduced because some employees 
were on an hourly rate rather than a piece rate system of payment and 
were eliminated for this reason from the comparison. 
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A statistically reliable difference in salary ratio exists be- 
tween the superior and mediocre groups (D/o diff. is 3.7). 
These data indicate that superior salary ratios go with superior 
time scores on tests as well as superior ratings by foremen. 

While limited in scope, the follow-up of the preliminary 
study (2) supports the evidence accumulated in the present 
investigation. It is sugested here that time scores on the tests 
are predictive of the proficiency criteria of length of employ- 
ment and salary over a considerable period of employment. 


SUMMARY 


The following summary is presented according to the ques- 
tions raised on page 2. 

1. Three reliable criteria of watch factory work were estab- 
lished: length of employment, salary ratio and foremen’s 
ratings. Length of employment and salary ratio have a reli- 
ability of 1.00. Foremen’s ratings correlated + .60 + .09. 

2. The above criteria measure different aspects of pro- 
ficiency. The intercorrelations of these criteria ranged from 
+.13 + .10 for salary ratio and foremen’s ratings to + .44 + .07 
for length of employment and salary ratio. 

3. The O’Connor Finger Dexterity and T weezer Dexterity 
tests were selected as manual aptitude measures for use in the 
watch factory based on a job analysis. 

4. The test indicators derived from the testing program were 
time scores on the finger and tweezer dexterity tests, quality 
ratings on performance in these tests, and improvement on the 
second half of the finger dexterity test. Time score and qual- 
ity ratings were accepted as having satisfactory reliability. 
The reliability coefficients were +.89 + .03 and +.88 + .03, 
respectively. Neither absolute improvement nor relative im- 
provement was found to be a reliable test indicator. Their 
reliability coefficients were +.13 + .12 and + .26 + .11, respec- 
tively. 

5. The test indicators generally were distinctive measures of 
test performance. The intercorrelations of test indicators were 
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all low with but two exceptions, time score on the tweezer 
dexterity test and its quality ratings (r=+.71+ .04) and 
quality ratings on the finger and tweezer dexterity test (r= 
+.26 + .08). 

6. Time score in the finger and tweezer dexterity tests gen- 
erally showed the highest prediction of the proficiency criteria. 
Quality ratings on test performance were valuable for predic- 
tion in some instances. Improvement on the second half of 
the finger dexterity was not predictive of the criteria but this 
might have been expected from its low reliability coefficient. 
The facts demonstrating the predictive value of the test indi- 
cators will be summarized below. 

6A. Quality ratings during testing for finger dexterity were 
not indicative of length of employment. But those with 
‘‘average or below’’ racings on the tweezer dexterity test were 
found in the shorter categories of employment. Sixty-one per 
cent of the group who received such ratings were no longer 
employed after four months and only 27 per cent of those who 
received ‘‘above average’ ratings left or were dismissed within 
this period. The difference was statistically reliable and D/o 
diff. was 3.6. When the quality ratings for both tests were 
combined and distributed according to length of employment, 
no statistically reliable differences existed between those rated 
high and rated low in the test performances. 

6B. Low correlations were reported between the quality 
ratings on either dexterity test and salary ratios. The corre- 
lations were + .17 + .11 for the finger dexterity test and salary 
ratios, +.15 + .11 for the tweezer dexterity test and salary 
ratios and +.05 + .12 for the combined quality ratings and 
salary ratios. 

6C. Quality ratings on the finger test were related to fore- 
men’s ratings of job performance with a coefficient of con- 
tingency of + .50. The C for tweezer quality ratings and fore- 
men’s ratings was + .24. A coefficient of contingency of + .30 
was obtained when foremen’s ratings were correlated with the 
combined quality ratings on both tests. (Maximum C =+.86). 
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6D. Improvement on the second half of the finger dexterity 
test was not predictive of length of employment. 

6E. Improvement correlated with salary ratio —.06 + .13. 
This is the only comparison of the entire investigation that 
shows a negative though unreliable relationship between test 
indicators and the criteria of proficiency. 

6F. Of the workers who received ‘‘A’’ ratings by foremen, 
100 per cent showed improvement on the second half of the 
finger dexterity test. Of the workers who received ‘‘D’’ rat- 
ings, only 50 per cent improved. Eighty-four per cent of those 
who received ‘‘B’’ ratings improved and 61 per cent of those 
who received ‘‘C’’ ratings improved. The difference in per 
cent between those who received ‘‘A’’ and ‘‘D”’ ratings is not 
statistically reliable because of the limited number of subjects. 

6G. Time scores on both the finger and tweezer dexterity 
tests were faster, on the average as length of employment in- 
creased. The D/o diff. for the average time on finger dexterity 
test between the ‘‘less than 7 day’’ and the ‘‘more than 1 year’’ 
groups was 4.3. In the same comparison on the tweezer dex- 
terity test the D/o diff. was 2.5. Combining the finger and 
tweezer dexterity time scores did not increase the statistical 
reliability of the difference, and D/o diff. between the ‘‘less 
than 7 day’’ and ‘‘more than 1 year’’ groups was 2.3. 

6H. The correlation between finger dexterity time score and 
salary ratio was +.26 + .10; between tweezer dexterity time 
score and salary ratio it was + .382 + .10; and between the com- 
bined test times and the salary ratio it was + .39 + .09. 

61. The ‘‘above average’’ group according to foremen’s 
‘ratings was 5 seconds faster on the finger dexterity test and 9 
seconds faster on the tweezer dexterity test than the ‘‘average 
and below’’ group. The difference for the combined test scores 
between the two groups was 12 seconds in favor of those receiv- 
ing above average ratings. These differences were not statis- 
tically reliable. 

7. The practical value of the critical scores (time score of 
5’30” or better on the tweezer dexterity test and 730” on the 
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finger dexterity test) which were suggested in the preliminary 
study (2) is clearly indicated in this investigation. These 
scores discriminate employees in the watch factory with a con- 
siderable degree of exactness according to the criteria of pro- 
ficiency. This will be indicated in the following summary. 
7A. A comparison according to length of employment 
showed that 7 per cent of the group that ‘‘ passed both tests’’ 
left within one week whereas 23 per cent of the ‘‘no test’’ 
group and 24 per cent of the group of workers who ‘‘ failed 
either or both tests’? were unemployed after one week. The 
differences in the percentages between the first and the last 
two were statistically significant with critical ratios of 3.2 and 
3.4. The greatest possibility of prolonged employment was 
found in the group that ‘‘passed both tests.’’ Of this group, 
72 per cent remained four months or longer. This percentage 
was significantly different from that of the ‘‘no test’’ group 
(D/o diff. of 3.1), and from that of the group which ‘‘failed 
either or both tests’’ (D/o diff. of 4+). 
7B. A comparison according to salary ratios indicated that 
the group that ‘‘passed both tests’’ earned the most money. 
The earnings of this group were statistically different from the 
earnings of the group that ‘‘failed one or both tests,’’ with a 
D/o diff. of 5. The ‘‘no test’’ group was superior in salary 
ratio to the group that ‘‘failed either or both tests,’’ with a 
D/o diff..of 3.5. The group that ‘‘passed both tests’’ was 
superior to the ‘‘no test’’ group but the D/o diff. was only .9. 
7C. A comparison according to foremen’s ratings showed 
only a trend. The group that ‘‘passed both tests’’ was rated 
by foremen as ‘‘better than average’’ in 34 per cent of the 
eases. The group that ‘‘failed either or both tests’’ was rated 
as ‘‘above average’’ in 25 per cent of the cases. This differ- 
ence was not statistically reliable. The D/o diff. was .9. No 
differentiation between the ‘‘no test’’ group and the ‘‘ passed 
both tests’? group is possible according to foremen’s ratings. 
8. A follow-up of the subjects in the preliminary study (2) 
supports the findings of the present investigation that time 

















414 MILTON L. BLUM 


scores on the tests are indicators of proficiency. Two years 
prior to the ‘‘follow-up’’ 20 workers were selected by foremen 
as superior and 17 as mediocre workers in the watch factory. 
These groups were originally differentiated in test scores with 
a critical ratio of the difference of 2.18 for the finger test and 
of 1.01 for the tweezer test. The size of the groups, of course, 
affected the significance of their differences. The D/o diff. for 
the percentage discharged, which was the difference between 
0 and 18 per cent was 2. The D/o diff. for the salary ratio 
was 3.7. 
APPLIED CONCLUSIONS 


A positive answer can be made to the question of whether 
or not measures of manual aptitude will predict industrial pro- 
ficiency in the specific instance of the watch factory. For a 
complete answer to this question, comparable studies should 
be made to investigate if manual aptitude measures of this in- 
vestigation are applicable to other similar industries. 

It is evident from this investigation that the population of 
the watch factory is superior to the norm for factory popula- 
tions generally. The manual aptitude measures selected, based 
on a job analysis, i.e., the tweezer and finger dexterity tests 
devised by O’Connor, can be used to select competent watch 
factory workers with a higher degree of success than is pos- 
sible if the interview is the sole selection technique. If inex- 
perienced people achieve time scores of 7’30” or faster on the 
finger dexterity test and 5’30” or faster on the tweezer dex- 
terity test then management should be able to hire them with 
a greater degree of surety that they will be the employees 
wanted. This means, of course, according to proficiency ecri- 
teria of this investigation, which were acceptable to the em- 
ployer. The tests and the test indicators devised upon them 
are aids in the selection of watch factory workers. 

Such selection can benefit both employer and worker. The 
former benefits because his labor turnover is reduced, his pro- 
duction is increased, and his foremen get along better, aecord- 
ing to their opinion, with his employees. The worker is bene- 
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fited because she is less likely to suffer any shock of being dis- 
missed from a job that may require work which she cannot 
perform. Also, she is less likely to work at a job that will pay 
a low total wage, by way of the piece rate system, because of 
her own limitations. These benefits to employer and employee 
are real contributions which Industrial Psychology can make 
in the selection of employees. 

But it is the writer’s view that these contributions can be 
made only when the psychologist works with and solves a spe- 
cific industrial problem. A laboratory offers many possibili- 
ties for valuable contributions to knowledge. But in industry, 
influences are present which do not exist in the psychological 
laboratory. It seems necessary that the industrial psycholo- 
gist set up his own laboratory in the actual employment situa- 
tions, as was attempted here, so as to be sure of a practical 
answer to his problem. When he does this he may find that 
Le must attack the problem in an unorthodox scientific man- 
ner; he must forsake desirable scientific controls to solve his 
practical problem. This was true of the investigation reported 
here. 

The study raises the question of what is the mental effect 
that a test, such as the finger or tweezer dexterity, has on an 
applicant? Unfortunately, the set-up of this study prevented 
an answer to this question. ‘‘Nervousness and irritability’’ 
in the applicant should be investigated so that the examiner 
may know better how much consideration should be given to 
‘‘nervous people’’ in securing his measures of test perform- 
ance. 

Another problem for future investigation has already been 
suggested. The quality ratings of the examiner can be treated 
on a similar manner as the time scores. It may be possible to 
devise critical scores of this test indicator that will divide 
workers according to their adjustment over a considerable 
period of time as seen through the eyes of the foreman for it 
has been indicated in this investigation that an examiner’s 
ratings upon the quality of tested performance are related to 
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foremen’s ratings. But in doing this there are numerous diffi- 
culties, as have been indicated to the writer in another investi- 
gation. Entering into all foremen’s ratings where it is known 
that they will be used for promotion or dismissal, are feelings 
of personal responsibility. In this investigation emphasis was 
placed on the theoretical aspects, that is, it was understood 
that a check-up of an experimental study was desired, not a 
check-up of actual individual efficiency. The success of such 
an investigation as suggested depends upon removing from the 
ratings by the foremen the influence of personal responsibility. 
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THE PURDUE EYE-CAMERA: A PRACTICAL 
APPARATUS FOR STUDYING THE 
ATTENTION VALUE OF AD- 
VERTISEMENTS* 


JAMES SPIER KARSLAKE 
Purdue University 


I. INTRODUCTION 


DVERTISERS have long recognized the possibilities 
A inherent in eye photography as a method of evaluating 
certain aspects of advertising copy. However, they 

have been justifiably reticent in attempting any very extensive 
use of this approach because of limitations which have charac- 
terized standard methods of eye photography. Even with 
these limitations, recent work, particularly that of Brandt (1), 
has attracted considerable attention. Brandt has described a 
camera which, utilizing the principle of corneal reflection, 
photographs a beam of light reflected from the cornea of the 
eye. The ray photographed, when reflected from the center 
third of the cornea, varies in direction and amount of displace- 
ment very nearly in proportion to the angular movement of 
the eyeball itself. Records of this movement, when made upon 
a continuously moving strip of film, indicate the eye move- 
ments of the reader. When these records are made upon two 
films, one moving horizontally and one vertically, eye move- 
ments in both directions may be recorded. This was the essen- 


1 Based upon a thesis submitted by the writer to the Faculty of Purdue 
University in partial fulfillment of the requirements for the degree of 
Doctor of Philosophy, July, 1939. This thesis was directed by Dr. Joseph 
Tiffin. Patent applications have been filed covering the method and equip- 
ment described herein. No use of this method and equipment may be 
made without the consent of The Purdue Research Foundation to whom 
the patent rights have been assigned. 
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tial principle followed by Brandt, and the camera which he 
developed is better adapted to the use of advertisers than most 
of the instruments previously available. 

Unfortunately, the reading situation imposed by this method 
of investigation is an unfavorable one. The following are ob- 
jectionable factors inherent in the method : 

(1) To keep the eyes directly in front of the camera and to 
obtain a record of eye movements alone, head movements must 
be eliminated through some arrangement for holding the head 
rigidly in place. 

(2) Since undistorted reflection occurs only from the center 
third of the cornea, the angular field of view through which 
eye movement is permissible may be no more than fifteen or 
twenty degrees. Therefore, the selection to be read, if placed 
at normal reading distance, must be restricted in size; or, if it 
is desirable to use an area the size of a double page spread 
from some periodical, the material must be placed beyond nor- 
mal reading distance. The larger the area, the farther away it 
must be placed if the photographic record is to picture move- 
ments of the eye with reasonable fidelity. 

(3) The reader is distracted by the presence of lights (for 
reflection from the eyes) within the peripheral field of vision, 
the general awkwardness of the reading situation, and the diffi- 
eulties involved in being properly placed in preparation for 
taking the eye movement pictures. 

(4) Interpretation of the records by projection is a difficult 
and time consuming task. 

, A recent method, overcoming some of these objections, de- 

pends upon the measurement of corneo-retinal potentials ac- 
companying movement of the eyes. Early work was done by 
Meyers (5) in connection with studies on nystagmus. Hal- 
stead (2) has recently applied the method to the quantitative 
study of eye movements in both the horizontal and vertical 
planes. Hoffman, Wellman, and Carmichael (3) have com- 
pared the results with those obtained by corneal ‘reflection. 
Too little work has been done to permit any conclusions at this 
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time as to how well the method may be suited to studies on the 
attention value of advertising. 

An objective method for the evaluation of advertising copy, 
useful either before or after publication has occurred, appears 
desirable. If to be attained through eye movement photog- 
raphy, the apparatus should be light and portable, the reading 
situation normal, and the photographic record easily inter- 
preted. The method should yield results that are both reliable 
and valid. 


Il. THE PROBLEM 


In view of the needs peculiar to an evaluation of advertising 
copy in a practical situation, the problem of this investigation 
became the development of an eye-movement camera that would 
approximate as closely as possible the following specifications: 

(1) The equipment must be light and portable. 

(2) The reading material should be placed at normal read- 

ing distance. 

(3) The reader should have an unrestricted field of view. 

(4) The reading situation should be a normal one with the 

reader at leisure to leaf freely through the material 
to be read. 

(5) The reading material should be continuously identified, 

frame by frame, on the film itself. 

The photographic record should be easily interpreted. 

Interpretation of the record should be sufficiently aceu- 
rate for differentiation between adjacent areas of an 
advertisement. 

(8) The method and results should be reliable and valid. 


Ill. PRELIMINARY INVESTIGATION 


Direct photography of the eyes with an ordinary camera was 
tried under various conditions. With the reading material 
placed on an easel at normal reading distance from the reader, 
and a light so placed within the peripheral field of vision as to 
give an image on the cornea of the eye, a number of still pic- 
tures were taken with the reader looking in certain specified 
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directions at the reading material. Numerous placements of 
both eye lights and camera were tried in an attempt to deter- 
mine whether or not any simple relationship existed between 
locations of the image of the eye light on the cornea with refer- 
ence to the pupil and the direction of the reader’s fixation. 
No simple relationship was found ; instead, a sense of direction 
for the eyes as photographed was as easily estimated from the 
picture of the eye itself as from any single configuration of the 
pupil and corneal image. 

The method suggesting itself, under the circumstances, was 
indirect photography of the eyes by means of a partially silvered 
mirror, the camera to be placed directly above the reader with 
the mirror halfway between the reader and the reading mate- 
rial. In this way, still pictures were obtained of the reader’s 
face by reflection, with the bottom of the reading material ap- 
pearing on the forehead of the image. Early investigations 
were made to determine how successfully adjacent areas could 
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Fic. 1. In the center is a typical picture taken with the Purdue Eye 
Camera. The reader whose eyes were photographed was looking at the 
girl on the bieyele. 
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be distinguished from one another on interpretation of the 
pictures. They indicated that adjacent areas as small as two 
inches square could be reliably differentiated from one another. 
Moreover, interpretation was as easily made whether the 
reader was wearing glasses or not. 

Ten still pictures were then taken of a reader looking at vari- 
ous items of interest in a copy of the Saturday Evening Post. 
These were mounted on each spread in the manner shown in 
Figure 1. Six people, without previous experience, were asked 
to identify the areas from the sense of direction apparent in 
each picture. The results of this simple preliminary study are 
given in Table 1. The judges correctly identified 58 out of 
their combined total of 60 placements, or 97 per cent of them. 
These results were taken to indicate that the method should be 
expected to yield results that were both reliable and valid. It 
also appeared that the material on any given page might as 
readily be divided into areas of a size and shape suggested by 
the layout of the material itself as into a plan of any precon- 
ceived regularity superimposed on each page without refer- 
ence to the arrangement of the items of interest. 






IV. APPARATUS AND TECHNIQUE 


Since the results from the preliminary studies indicated that 
satisfactory results might be expected by indirect photography 
of the eyes, the equipment was arranged as shown in Figure 2. 
As can be seen from the figure, the material is placed at normal 
reading distance, and the reader is free to leaf through the 
periodical at his leisure. The partially silvered mirror, 12 by 
17 inches, placed between the reader’s face and the material to 
be read, transmits 50 per cent of the incident light and reflects 
22 per cent of it, absorbing the remainder. With illumination 
on the reading material arranged as shown, a mirror of such 
low reflecting power appears to the reader to be wholly trans- 
parent, and is large enough to enable him to see plainly all of 
the periodical beyond it. An image of the face is then reflected 
at a distance as far behind the mirror as the reader is placed in 
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Fic. 2. The Eye Camera in operation. The reader leafs through the 
magazine, stopping to look or read wherever and as long as he wishes. 








Fig. 3. A single frame picture typical of those taken at 16 frames per 
second with the camera illustrated in Figure 2. Notice that the picture 
serves to identify the page as well as to indicate the fixation direction of 
the reader ’s eyes. 
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front of it. With the mirror tilted upward slightly, this image 
can be seen in the plane of a piece of black paper affixed to the 
bottom edge of the easel. Under the circumstances, the reader 
is unable to see any reflection in the mirror. Moreover, the 
reflected image of the face is so enhanced by contrast with the 
black background as to be easily photographed. The camera, 
placed at an angle overhead, records on motion picture film a 
picture of the reader’s face by reflection, together with an 
image on the forehead of the lower portion of the printed 
page as shown in Figure 3. Motion pictures taken on eight 
millimeter film at sixteen frames per second? furnish a contin- 
uous record of eye behavior as the reader leafs through the 
material. From this record can be determined page identifica- 
tion, first looks on each page, the length of time given to any 
item, and the time at which the page is turned. 

















Fie. 4. Arrangement of the equipment for interpreting the eye pic- 
tures (frame by frame). 


2 The camera has recently been rebuilt to photograph at the rate of 
three frames per second. 
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For the purpose of interpretation, these pictures are pro- 
jected frame by frame upon a small screen placed in the center 
of each spread as shown in Figure 4. Under these cireum- 
stances, with very little practice one can readily determine 
from the sense of direction apparent in a succession of pictures 
the sequence in which areas of interest attract the eyes of the 
reader. A count of the number of frames spent on any given 
area is a measure of how long the area held the reader’s atten- 
tion. First looks and last looks are as easily recorded. 

The whole technique may be thought of as an extension and 
objectification of the seanning method used by Nixon (6). The 
present method gives to Nixon’s method a means of recording 
for permanent use or more precise analysis. 


V. RELIABILITY AND VALIDITY OF THE TECHNIQUE 


The next step was to evaluate this method of investigation 
as applied to the attention value of advertising. One subject 
was instructed to practice a certain routine order of looking at 
successive items of interest on seven consecutive spreads in one 
copy of the Saturday Evening Post. When the reader had 
this routine well learned, eye-movement pictures were taken 
by the foregoing method as she leafed through the magazine 
looking at the material in the practiced order. Four people 
were then asked to interpret this film on which there were ap- 
proximately 1500 frames. The interpreters worked inde- 
pendently. They ranged in previous experience in inter- 
preting pictures of this kind from six weeks of intensive work 
on similar pictures to no previous experience whatever. The 
items of interest* were arranged in the order followed by 
the reader, both as given in the reader’s report and as deter- 
mined by each judge through his interpretation of the film. 
Numbering in sequence the areas reported by the reader and 
those determined by each judge on interpretation of the film, 
rank order correlations were made between the film interpreta- 


8 These areas are identified on pictures of the seven spreads that were 
used and are illustrated in the appendix of the thesis referred to in foot- 
note 1 at the beginning of this article. 
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tions of each judge and every other judge, and between the 
film interpretation of each judge and the reader’s report. 
The rank order correlations between the interpretations of 
each judge and every other judge, given in Table 2, indicate 


TABLE 2 


Pearsonian r Corresponding to the Rank Order Correlation between the 
Sequence of Areas as Determined by the Film Interpretation of Each 
Judge and That of Every Other Judge 





Judge #2 Judge #3 Judge #4 








9 r r 
Fudge $V oo ececonnonn 99 99 96 
Judge $2 orn 98 98 


Ne BE cerns 97 





the reliability of interpretation of the film. Various measures 
of agreement between the interpretation of each judge and the 
reader’s report are given in Table 3. It should be noted that 
validation of the method is in terms of a fallible criterion: 
namely, the reader’s report, but it seems likely this is the best 
criterion available. The measures of agreement between the 
film interpretation of each judge and the reader’s report are 
satisfactorily high ; especially is the rank order correlation on 
the sequence of areas on which the reader and each judge agree 
so high as to permit the conclusion that the method, at least in 
terms of this criterion, is a valid one—certainly high enough to 
justify the use of the method for group prediction. 


VI. , INVESTIGATION ON THE ATTENTION VALUE OF SEVEN SPREADS 


Seven consecutive spreads from the Saturday Evening Post 
for January 21, 1939, were chosen for investigation by the 
method described in this study. On the seven spreads chosen 
appeared five full-page advertisements, five partial-page ad- 
vertisements, three cartoons, and a poem. Of the full page 
advertisements, three were in color, and two were black and 
white. The selection of partial-page advertisements contained 
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TABLE 3 


Various Measures of the Extent of Agreement between the Number and 
Sequence of Areas as Determined by the Film Interpretation of Each 
Judge and as Reported by a Trainéd Reader Looking through the Maga- 
zine Following a Practiced Sequence 





Judge #1 Judge #4 
(most ex- Judge #2 Judge #3 (least ex- 
perienced ) perienced ) 





Of the 60 fixation areas re- 

ported by the reader, the 

number spotted by each 

judge on interpreting the 

film is: 54 47 42 41 
Of the total number of fixa- 

tion areas reported by the 

reader, the percentage 

spotted by each judge on 

interpretation of the film 

Sea 90% 78% 70% 68% 
Of the total time spent in 

looking at the material, 

the percentage of time al- 

lotted by each judge to 

areas other than those re- 

ported by the reader is: .. 1.79% 11.6% 9.86% 4.67% 
The rank order correlation 

between each judge and 

the reader on the sequence 

of areas on which both 

agree as to identity is: ... .99 99 99 98 











three in black and white and two in color. All of the cartoons 
and the poem were in black and white. 
The copy as it appeared on consecutive pages was as follows: 


Pages 46-47 
Hastings Piston Ring Alliance Insurance Company 
4 page ; 2 colors Single column ; black and 
white 
Pages 48-49 
Cartoon Coffee Producers’ Association 


34” square ; black and white Full page ; black and white 
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Pages 50-51 

Zonite Antiseptic Squibb Dental Cream 

Single column ; black and 4 page; 2 colors 

white 

Pages 52-53 

Kool Cigarettes Cartoon 

Full page; 4 colors 2” x 3”; black and white 
Pages 54-55 

Cartoon Conoco Oil 

3” x 4”; black and white Full page; 2 colors; bleed 
Pages 56-57 

Pyroil Arrow Shirts 

4 page; 2 colors Full page ; 4 colors 
Pages 58-59 

Poem Bayer Aspirin 

3” x 4”; black and white Full page ; black and white 


The issue used in this study had not yet been placed on the 
newsstands and had not previously been seen by any of the 
readers who were asked to take part in the investigation. 

A copy of this Saturday Evening Post was placed on the 
easel before each of forty-eight men and fifty-two women, and 
the easel adjusted to the eye height of each reader. The 
instructions were given: ‘‘I want you to sit here at ease and, 
while listening to the radio, leaf through the Saturday Eve- 
ning Post in the same manner in which you might leaf through 
a similar copy should you have a few moments to spare while 
at home.’’ Eye-movement pictures were then taken of each 
reader while leafing through the selections previously chosen 
for investigation. 

In presenting copy for evaluation precisely as it appeared 
on publication, each advertisement was placed in competition 
with whatever happened to appear on the opposing page. In 
other words, the circumstances under which any given adver- 
tisement was evaluated were those under which the advertise- 
ment had appeared in print. 
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The films were interpreted in the way described in the sec- 
tion on ‘‘Apparatus and Technique’’ using the equipment 
illustrated in Figure 4. The number, size, and shape of areas 
that were of interest for identification purposes were dictated 
by the layout of material on each page. For example, a cer- 
tain full page advertisement may have contained a heading, 
three pictures, two sections on context, and a signature, all 
prominently displayed. There were then considered to be 
seven areas of interest on that page. These seven areas were 
then outlined on a tracing paper insert placed over the adver- 
tisement, and notations were made of the area first to attract 
the reader’s attention, the total number of frames spent on 
each area, and the number of frames spent on the advertise- 
ment as a whole.* 


Vil. RESULTS OF THE INVESTIGATION ON THE ATTENTION VALUE 
OF THE SEVEN SPREADS 


The advertisements, cartoons, and poem considered in the 
foregoing investigation were ranked in order of merit on the 
basis of a measure of the median and mean length of time 
spent on each of them. Time measurements were a count of 
the number of frames spent on each area; each frame repre- 
senting one-sixteenth of a second. The rankings in order of 
merit were obtained for one-half of the men, one-half of the 
women, and one-half of the group as a whole regardless of sex ; 
these were correlated with the rankings obtained for the other 
half of each corresponding group. The orders of merit and 
rank order correlations for each group are given in Table 4. 
These rank order correlations are a measure of the reliability 
of evaluation, by each group of subjects, of the advertisements, 
cartoons, and poem, from the most attractive to the least 
attractive one found in the series. The criterion of attractive- 
ness is the amount of time spent on each one of them. The 


4 The seven spreads selected for investigation, together with identifica- 
tion of the areas on each page, and sample daiia are reproduced in the 
appendix of the thesis referred to in footnote 1 at the beginning of this 
article. 
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TABLE 4 
A Measure of the Consistency with Which the Materials Included in This 
Study Were Ranked in Terms of the Median and Mean Length of 
Time Spent on Each Item of Interest 





Order of merit based on: 
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Rank order 
Correlation: .79 84 95 68 80 90 
Equivalent 
Pearson r: 80 85 96 .70 82 91 
Tes 89 91 82 90 





NOTE: r. is the predicted split half reliability for samples of 96 men and 
104 women as stepped up by the Spearman Brown prophecy formula 
in order to make the results comparable with the split half reliabili- 
ties obtained for 100 people. Comparable split half reliabilities and 

- their standard errors, then, are 
r: 89 91 96 82 90 91 
Or: 075 057 029 124 .069 .063 


24M: interpret as 24 men 
26W: interpret as 26 women 
50P: interpret as 50 people 
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Pearson reliability coefficients estimated from these rank order 
correlations were found to range from .82 to .96. One may, 
therefore, conclude that the relative evaluation of copy, in 
terms of the time spent in looking at the material as deter- 
mined through eye movement pictures of the kind described, 
is highly reliable. 

Rank order correlations were obtained for both men and 
women on the consistency with which they were attracted to 
different areas within each advertisement in terms of the time 
spent in each area. These correlations are given in Table 5. 


TABLE 5 


Consistency with Which Readers Divide Their Time on the Several Parts 
of the Layout in Looking at Advertisements 





Correlations between the Correlations between the 
amount of time spent on amount of time spent on 
each part of each adver- each part of each adver- 
tisement by the first 24 tisement by the first 26 
men and that spent by women and that spent by 
the second 24 men in the the second 26 women in 











3 group the group 
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Hastings ........... 4 0.00 89 1.00 89 
Allianee .............. 1.00 1.00 1.00 1.00 
Coffee ................... 7 .78 97 94 .80 
| EES 3 68 .68 .68 1.00 
Geeltd: a... 3 1.00 .68 .68 68 
ane 7 .70 85 95 89 
ee 9 .94 92 .78 .86 
Pyroil .68 .68 94 .68 
Arrow 14 .62 .98 .96 
Bayer 95 93 85 71 








The results indicate a consistent order of merit for the areas 
within an advertisement in terms of the time spent on each 
area. 
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The results of a nationally known survey on certain of the 
advertisements considered in this investigation were obtained 
for comparison with the results of this study. In the course 
of the survey, 206 men and 201 women had been interviewed. 
In each instance the person was shown an advertisement and 
asked the question ‘‘ Have you or have you not seen this adver- 
tisement before?’’ The percentage of each sex who answered 
in the affirmative was then determined and used as an index 
to evaluate the given advertisement. 

On the basis of these percentages, the advertisements were » 
ranked in order of merit for comparison with the rank orders 
obtained by the photographic method of this investigation. 
The rank order correlations between the survey and the 
median or mean length of time spent on each advertisement 
are given in Table 6. These correlations are so low as to be 
not statistically significant. In other words, there is little or 
no correlation between the two methods and they are obviously 
not measures of the same thing. If one is a measure of the 
attention value of a given advertisement, the other clearly is 
not. In view of the fact that the method used in the survey 
did not correct the recognition figures for the confusion ele- 
ment which has been emphasized by Lucas (4), and in view of 
the unknown reliability of the survey results, it is quite pos- 


sible that the eye camera gives the more accurate measure of 
attention value. 


VIII. INVESTIGATION ON A NUMBER OF ADVERTISEMENTS BY MEANS 
OF A TRIAL SURVEY 


Since the reliability of the results of the survey may be ques- 
tioned, an investigation of a similar kind was undertaken on 
a smaller seale to determine, if possible, what meaning could 
be attached to evaluations resulting from the survey method. 

Two issues of the Saturday Evening Post were obtained, one 
an old copy which appeared during the previous month, and 
one a new copy not yet placed on the newsstands for distribu- 
tion. From each issue, six full-page and one two-page adver- 
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TABLE 6 
A Measure of the Extent of Agreement, on Certain of the Advertise- 
ments Included in this Study, between Their Relative Evaluations by 
Means of a Nationally Known Survey, and Their Relative Evaluations in 


Terms of the Median and Mean Length of Time Spent on Each Adver- 
tisement 





Order of merit based on: 


Percentage of Median length of Mean length of 
the women inter- time spent on time spent on 
viewed on survey each advertise- each advertise- 



































who observed the ment by 52 ment by 52 
advertisement women women 
ee 7 5 5 
Coffee 3 3 
Squibb 7 7 
Kools + 4 
Conoco 1 1 
Pyroil 8 8 
Arrow 2 2 
Bayer 6 6 
Percentage of Median length Mean length of 
the men inter- ; - . raphe 
viewed on the of time spent time spent on 
survey who ob- on each adver- each adver- 
served the ad- —— os 
vertisement 
IIE © Sipitsniastsnciesechease 5 5 4 
gh Ee nee 64 1 2 
Squibb 64 7 7 
Kools .... 1 6 6 
geen ESR ee ee 34 24 3 
ER Se ae 8 8 8 
(NS ES eM 2 24 1 
ee ae 34 4 5 





For women, the rank order correlation between the survey and either 
the median or mean length of time spent on each advertisement, is: 
p=.13; the corresponding Pearsonian r is: r=.14; and the standard 
error of the r is: 6, = .36. 

For men, the rank order correlation between the survey and the median 
length of time spent on each advertisement is: p=.32; the corresponding 
Pearsonian r is: r=.33; and the standard error of the r is: 6, =.33. The 
rank order correlation between the survey and the mean length of time 
spent on each advertisement is: p=.40; the corresponding Pearsonian r 
is: r=.42; and the standard error of the r is: 6, =.30. 
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tisements were chosen. The seven selected from each issue, 
were arbitrarily classified as ‘‘high’’ or ‘‘low’’ in recognition 
value, the judgment in each case being based upon whether the 
advertisement contained copy similar to that previously used 
by the same advertiser. If it did, the advertisement was con- 














TABLE 7 
Results of the Sample Survey on the ‘‘ Recognition’’ of old and new 
advertisements 
35 men 108 women 
% 
B. sf a. 38 
Old Saturday Evening Post Ps P| 2 & F| 2 FS w Sas 
(Issue of April 8, 1939) San that Bet Saet 
E2B » 28H oS waogd 
-Ss #288 288 gE 8 
Swe Sewer SoZ Bae we 
gos S.85 BHE SEa5 
3 bb BRE SEO EEES 
Zas MESS ZER AEST 
*Champion Spark Plug .................. 29 83% 69 64% 
*DeSoto Motor Car ..cccccccccccesne 18 52% 64 59% 
*Chesterfield Cigarette ................... 31 87% 94 87% 

American Gas Association .......... 7 20% 46 43% 

Barre Granite Association. .......... 7 20% 32 30% 

Capital Stock Insurance ................ 11 31% 20 19% 
*Goodyear Tire and Rubber ...... 26 75% 61 57% 

New Saturday Evening Post 
(Issue of May 13, 1939) 

Phileo Air Conditioning .............. 7 20% 21 19% 
oe en 16 46% 46 43% 
*TaSalle Motor Car .cc.cecccccsccccosssseeee 17 49% 58 54% 

American Institute of Baking .. 6 17% 34 32% 
*Gulfpride Oil 14 40% 53 49% 

Underwood Typewriter ................ 24 69% 67 61% 
*Goodyear Tire and Rubber ........ 21 59% 41 38% 





* These advertisements are the ones arbitrarily chosen before the sur- 
vey as ‘‘high’’ in recognition value. After the results of the survey had 
been obtained, the Underwood Typewriter advertisement was added to 
the ‘‘high’’ group. 
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sidered ‘‘high’’ in recognition value; if not, it was considered 
‘‘low.’’ In preparation for the experiment, the covers of all 
magazines were first removed to prevent date identification. 

A group of 35 men and 108 women students served as sub- 
jects. After instructions to indicate the responses on an 
answer sheet had been given, the fourteen advertisements were 
shown one by one. With each advertisement the question was 
asked: ‘‘Have you or have you not seen this advertisement 
before ?’’ 

The returns on this survey are given in Table 7. Presum- 
ably none of those questioned had previously seen the adver- 
tisements in the issue of May 13th; yet from 17 per cent to 69 
per cent of the men and from 19 per cent to 61 per cent of the 
women affirmed they had already seen the advertisements in 
this isssue. The percentage of affirmative responses expected 
on the issue of May 13th was zero, but the obtained percentages 
varied from 17 per cent to 69 per cent. Clearly, in terms of 
the percentage of those interviewed who ‘‘have seen’’ a given 
advertisement, the copy was over-evaluated on the survey. 
Those advertisements arbitrarily chosen as ‘‘high’’ in recogni- 
tion value yielded consistently more responses in the affirma- 
tive than those supposedly ‘‘low’’ in recognition value, with 
one exception. This was the Underwood Typewriter adver- 
tisement, which the survey indicated was among the ones that 
were ‘‘high’’ in recognition value. 

For the purpose of making a number of comparisons be- 
tween those ‘‘high’’ and ‘‘low’’ in recognition value, the 
advertisements from each issue were grouped as follows: 


Issue of April 8, 1939 
‘*High’’ in recognition value: 
Champion Spark Plug 
DeSoto Motor Car 
Chesterfield Cigarettes 
Goodyear Tire and Rubber Company 
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‘**Low’’ in recognition value: 
American Gas Association 
Barre Granite Association 
Capital Stock Insurance Company 


Issue of May 13, 1939 
‘*High’’ in recognition value: 
AC Spark Plug 
LaSalle Motor Car 
Gulfpride Oil 
Underwood Typewriter Company 
Goodyear Tire and Rubber Company 


‘‘Low’’ in recognition value: 
Phileo Air Conditioning 
American Institute of Baking 


Of the advertisements taken from the issue of April 8th, the 
four receiving the highest percentage of responses in the 
affirmative were included in the ‘‘high’’ group; and the 3 
receiving the lowest percentage of the responses in the affirma- 
tive were included in the ‘“‘low’’ group. Of the advertise- 
ments taken from the issue of May 13th, the five highest 
were considered ‘‘high’’; and the two lowest were considered 
“ce low. 9? 

The average percentage of affirmative responses given by 
both men and women were then determined for each of these 
groups and the results compared, as shown in Tables 8 and 9. 

Referring to Table 8, it will be noted that, for both men and 
women, the percentage of persons who ‘‘have seen’’ the adver- 
tisements ‘‘high’’ in recognition value differs significantly 
from the percentage of those who ‘‘ have seen”’ the ones ‘‘low’’ 
in recognition value; this holds true whether these advertise- 
ments appeared in the issue of April 8th, which may have been 
seen before, or in the new issue of May 13th which had not 
been seen previously. Apparently it made little difference 
whether the subjects had or had not seen these particular 
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TABLE 8 
Statistical Evaluation of the Results of the Survey 





Previously unseen 





Old advertisements . 4 
in the issue of or = 
April 8, 1939 M 

ay 13, 1939 
Men Women Men Women 





Average percentage of respon- 

dents who “have seen” those 

advertisements “low” in rec- 

OMMIFION VAIUC nn ceceecceeecssssnene 23.6% 30.6% 18.5% 25.5% 
Average percentage of respon- 

dents who “have seen” those 

advertisements “high” in 

TECOMNITION VALUE ...ceeccccsenne 74.2% 66.7% 52.6% 49.0% 
Average percentage of respon- 

dents who “have seen” all of 

the advertisements ................... 52.6% 51.3% 42.8% 42.3% 
Difference between the aver- 

age percentage of those who 

“have seen” the advertise- 

ments “high” and “low” in 

recognition Value occu 50.6 36.1 34.1 23.5 
o of the difference between 

the average percentage of 

those who “have seen” the 

advertisements “high” and 


“low” in recognition value .. 10.3 6.15 7.45 4.80 
Significance of this difference 
expressed as a critical ratio 4.91 5.86 4.57 4.89 





advertisements before. In either event, both men and women 
appeared to react in the same way to the question : ‘‘ Have you 
or have you not seen this advertisement before?’’ 

In Table 9 is given a number of comparisons between the 
old and new issues of the Saturday Evening Post, for both men 
and women. From the table, it may be seen there were no 
significant differences between the old and new issues in the 
percentages of men and women who report having seen the 
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TABLE 9 
Statistical Evaluation of the Results of the Survey—(Continued) 





























Men Women 
Difference between the old and the new issues of the 
Saturday Evening Post in the average percentage 
of those who “have seen” the advertisements 
“high” in recognition value 21.6 17.7 
Opitt. 14.9 7.8 
Significance of the difference expressed as a critical 
ratio ........ 1.45 2.28 
Difference between the old and the new issues of the 
Saturday Evening Post in the average percentage 
of those who “have seen” the advertisements 
“low” in recogntion value 5.16 5.16 
Onite. 4.99 3.80 
Significance of the difference expressed as a critical 
ratio 1.03 1.35 
Difference between the old and the new issues of the 
Saturday Evening Post in the average percentage 
of those who “have seen” the advertisements both 
“high” and “low” in recognition Value .......cccccccccoson 9.72 9.00 
Opite. 11.30 6.33 
Significance of the difference expressed as a critical 
ratio a 86 1.42 








advertisements. In other words, they were nearly as apt to 
report having seen the advertisements in the new issue as they 
were to report having seen those in the old. It may be that 
evaluations in terms of the percentage who report having seen 
an advertisement are expressions of the cumulative worth of 
an advertising program, but an evaluation of any given adver- 
tisement, in those terms, is clearly spurious, and the ques- 
tionnaire survey method of investigation seems to be an invalid 
one for this purpose. 


IX. SUMMARY 


The eye camera and method of investigation described in 
this study appear to conform to the requirements set forth at 
the beginning of this article, namely, 
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(1) The equipment is light and portable. 

(2) The reading material is placed at the distance of most 
distinct vision from the reader. 

(3) The reader has an unrestricted view of the reading 
material. 

(4) The reading situation is a normal one with the reader 
at leisure to leaf freely through the reading material. 

(5) The photographie record is easily interpreted. 

(6) Interpretation is sufficiently accurate for differentia- 
tion between adjacent areas as large as two inches square 
within an advertisement. 

(7) The method and results appear to be reliable and valid. 

(8) The reading material is continuously identified, frame 
by frame, on the film itself. 

The reliability and validity of interpretation of eye move- 
ment pictures, taken by the method described, is high. Rank 
order correlations between the film interpretation of each of 
four judges and the reported sequence of a practiced reader 
of whom eye movement pictures had been taken. were: .99; .99; 
.99, and .98. In other words, interpretation of the film is a 
valid measure of the manner in which the reader looked at the 
reading material. 

The reliability of the way in which one hundred people 
looked at the reading material was likewise high. In terms 
of the median length of time spent on each advertisement, car- 
toon, or poem, the reliability coefficients were: for men, .89; 
for women, .91; and for readers without reference to sex, .96. 
In terms of the mean length of time spent on each advertise- 
ment, cartoon, or poem, the corresponding reliability coeffi- 
cients were: for men, .82; for women, .90; and for readers 
without reference to sex, .91. These coefficients are all suffi- 
ciently high to permit the conclusion that this means of inves- 
tigating the attention value of an advertisement is both reliable 
and valid for the purpose intended. 

The low correlations obtained between attention value, as 
measured by the length of time spent in looking at copy, and 
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the percentage of those men and women who reported they had 
seen the same advertisements, when interviewed on a nation- 
ally known survey indicated the two methods did not give 
measures of the same thing. 

An investigation of a number of advertisements by means 
of a trial survey points to the conclusion that the percentage of 
those interviewed who report having seen a given advertise- 
ment is less a measure of the attention value of the advertise- 
ment in question than it is a measure of the cumulative worth 
of preceding advertisements for the same product. In other 
words, the ‘‘Have you seen? Have you read?’’ type of ques- 
toinnaire survey does not furnish an evaluation of individual 
copy. 

This lack of agreement between the eye camera and survey 
methods of evaluation, together with the affirmative findings 
of this investigation are, moreover, evidence on the validity of 
the eye camera technique as a method for determining the 
attention value of insertions of specific copy. 
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HE notable rise of interest during recent years in polls 
ee of public opinion has raised the question of their pos- 
sible effect in changing or determining belief. The 
matter is particularly prominent in the minds of those inter- 
ested in polls of presidential preference: for instance, there 
have circulated at various times rumors that congress will be 
asked to abolish such polls. Opponents of presidential polls 
argue principally in terms of the ‘‘ bandwagon theory,’’ which 
holds that publication of the results of presidential polls in- 
fluences voters in the direction of the leading candidate. 
Crities of this theory, however, point out that changes in pref- 
erence between the occasion of initial publication of the results 
of a poll and the time of election fail to show evidence of such 
an influence. Gallup (2) has reported the apparent failure 
of Literary Digest polls and the poll of the American Institute 
of Public Opinion to shift preference for candidates for the 
Republican nomination in the direction required by the band- 
wagon theory. Robinson (7) has cited similar cases which do 
not disprove the hypothesis definitely but ‘‘certainly argue 
against a too easy acceptance of the bandwagon theory.’” 
In addition to its practical implications, information on the 
influence exerted by public opinion polls should have consid- 
1 This point is given more detailed consideration later. 
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erable bearing upon the more general problem of the effect of 
majority opinion. Previous studies have demonstrated that 
this prestige factor markedly influences attitudes and opinions 
in a number of areas of human behavior. Among such studies 
are those of Moore (6), Wheeler and Jordan (10), Barry (1), 
Turtletaub (9), and Marple (5). 

Kulp (4) reports a study of the influence of majority and 
expert opinion in which is introduced a novel control. He 
employed Harper’s liberalism-conservatism test consisting of 
71 propositions on social, economic, political, religious, inter- 
national, and educational problems. The 343 graduate stu- 
dents used as subjects were divided into six experimental and 
control groups ranging in size from 48 to 73 respondents. 
After one application of the test these groups were retested 
under different conditions. The control group failed to change 
significantly. One group (Group I) was given a separate sheet 
on which was indicated a response for each item with the state- 
ment that these responses represented the majority opinions of 
lay citizens. This group shifted 8.55 points toward liberalism, 
the direction of the suggested responses. Another group 
(Group IV) was retested under the same conditions, i.e., with 
the items marked in the same way, with the exception of the 
failure to give any instructions as to the source of the sug- 
gested responses. This group, surprisingly enough, shifted 
6.72 points in the same direction. The critical ratio of each 
of these differences was greater than ten. Perhaps the differ- 
ence of almost two points represents the influence of majority 
opinion in excess of the influence of the mere presentation of 
indicated responses. It may be that previous studies have 
overestimated the influence of majority opinion by failing to 
control whatever factors were responsible for this shift in 
Group IV. On the other hand, a substantial proportion of 
the subjects in Group IV may have assumed that the suggested 
answers on their blanks did represent majority belief. The 
complexity of the causes of this shift is suggested by the un- 
expected direction of the change in another group. Retested 
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with the conservative responses indicated but not identified as 
such, the members of this group tended to respond negatively, 
and the mean score changed toward liberalism (4). 

- Evidently, if certain possible responses are differentiated in 
any way, even by the mere presence of markings, from the 
other possible responses this differentiation may alter the re- 
sponses obtained. Consequently, every precaution must be 
taken to prevent the subjects’ knowing that certain responses 
are being suggested by the investigator. In the present study 
this precaution was taken by disguising the purpose of the 
investigation and by attempting to prevent the possibility of 
the subjects’ associating the announcement of the poll used 
with the purpose of the study. 

It may be said at the outset that the unit of measurement 
adopted will fix the degree to which an investigation of this 
problem can be compared to previous studies of majority 
opinion. This is true because these studies have measured 
change by means of instruments which can be made sensitive 
to very slight shifts in belief, such as attitude scales. While 
the same method of measurement is equally applicable to our 
problem, there is an alternative method of recording only 
changes of large enough magnitude to modify overt behavior, 
i.e., to alter a vote. This method, while it precludes direct 
comparisons with other studies, has the advantage of carrying 
greater practical meaning. 

The 1936 presidential election saw public opinion polls reach 
a hitherto unprecedented popularity. The largest of the polls 
in terms of votes cast was that conducted by the Literary 
Digest. Special interest attached to it at the time because of 
the belief of a number of authorities that a sampling error was 
leading the poll’s interpreters to a false forecast. Conducting 
its poll by mail only, the Literary Digest relied chiefly on lists 
of telephone subscribers and automobile owners and, as a re- 
sult, obtained a sample heavily weighted by the upper economic 
class. 

Three additional polls, much less well-known at the time, 
made forecasts of a different outcome. These polls differed 
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principally from the Literary Digest poll in their use of sys- 
tematic sampling techniques and of a much smaller number 
of votes. 

The fact that approximately 2,000,000 votes had been 
counted by the Literary Digest seemed to constitute for lay- 
men an excellent example of strong majority opinion. The 
additional fact that other polls were of a different majority 
opinion increased the probability that positive results could be 
interpreted largely as the effect of contact with this particular 
poll. On the other hand, of course, it added to the difficulties 
of interpretation in the case of negative results. 

Experiment 1. The first of the two experiments to be re- 
ported in this paper measured the immediate after-effects of 
knowledge of the Literary Digest poll’s results. Equivalent 
experimental and control groups of subjects were chosen, only 
the former group being given the poll results prior to taking 
a vote. 

The subjects were 699 University of Minnesota students: 
549 from the men’s and women’s dormitories; and 150 from 
‘*How to Study’’ classes. The division of students into con- 
trol and experimental groups in the two dormitories was facili- 
tated by the fact that, in each case, there were two dining halls 
of approximately the same size. The socio-economic constitu- 
tion of these paired eating groups as well as that of the paired 
classes was practically identical. 

The experimental group, then, was composed of one of the 
men’s dining units, one of the women’s dining units, and two 
‘“How to Study”’ classes. The corresponding dining halls and 
classes constituted the control group. 

Essentially the same instructions were read to all groups. 
Only the experimental groups heard the italicized paragraph. 
The instructions read : 

** As an extension of the Presidential Polls conducted by the 
papers of Pioneer Hall and Sanford Hall, you are asked to 
cooperate in a straw vote. 

‘‘ Polls of this kind are being conducted all over the country. 
The largest of these polls is the Literary Digest Poll with a 
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total of over two million votes. As you probably know, Lan- 
don is leading in the Literary Digest Poll with a total of 54 
per cent of the votes cast. Roosevelt is second with a total of 
40 per cent. 

‘*You simply fill in your choice for president, your school, 
and your class year on these ballots. Do not write your name. 
Indicate your sex by printing M or F for male or female.’’ 

For each of the paired experimental and control groups the 
same experimenter was used. Immediately following the in- 
structions each subject noted on a secret ballot his presiden- 
tial preference, college and class. 

Table 1 shows the result of this balloting in terms of the 
percentage of the subjects who voted for Landon. When the 
control and experimental groups are compared it is seen that 
7.31 per cent more of the experimental group, those who heard 
the poll results announced, preferred Landon.? This differ- 
ence has a critical ratio of 1.95° and is approximately at the 
.05 significance level.‘ 

Since the experimental and control groups may be assumed 
to have differed from each other before the experiment as the 
result of chance factors only, and since the difference between 

2 The percentages reported represent a proportion of the total votes 
east. All calculations were repeated using only the ballots cast for the 
two major candidates. However, since identical conclusions were reached, 
this second set of figures is not reported. 

3 The critical ratio is the difference divided by the standard error of 


the difference. To find the standard error the following formula was 
used : 

O'aitt. = Pak + tee 
in which oa;+¢, is the standard error of the difference between two pro- 
portions, p is the proportion voting for Landon in one group, g is 1—p for 
the same group, and n is the number of subjects in the group. 

4 All caleulations were repeated using groups matched for sex and num- 
ber of years spent in college. In none of the analyses, however, did the 
matching yield results that would alter the conclusions. Evidently these 
two factors were not very effective controls, probably because the manner 
of selecting the control and experimental groups yielded two fairly similar 
and relatively homogeneous groups. 


— 








446 STUART W. COOK AND ALFRED C. WELCH 


TABLE 1 


Per Cent of 699 Subjects Favoring Landon Immediately Following the 
Announcement of the ‘‘ Literary Digest’’ Poll Results 











Per cent favor- , Diff. 
Group N ing Landon Diff. 7 ahi 
Experimenta! .......... 349 59.60 
7.31% 1.95 
insta 350 52.29 





the two groups in the proportion favoring Landon has been 
shown to have only 5 chances in one hundred of arising by 
chance, we may conclude that knowledge of the results of the 
poll increased the number of subjects preferring Landon. If 7 
per cent represents the true influence of the announcement of 
the results of the poll, we should judge polls of opinion to be of 
considerable practical significance under conditions similar to 
those of this experiment. These conditions, of course, are that 
the poll results were announced in a group situation and the 
vote taken in the same situation immediately thereafter. Ob- 
viously such is not the case in a presidential election. 

Experiment 2. The second experiment had two objectives: 
(1) analysis of the effect of giving a poll’s forecast to individ- 
uals previously unacquainted with the results of the poll; and 
(2) analysis of the presidential preference of groups already 
knowing the results of various polls. 

The subjects were 256 students in seven sections of the Uni- 
versity of Minnesota elementary laboratory course in psychol- 
ogy. Of these students, 41 per cent were males; 59 per cent, 
females. Five weeks before election day these students were 
asked by one of the authors to write answers to the following 
questions, presented one at a time: 

1. Do you intend to vote in the coming presidential elec- 

tion ? 

2. If you do, for whom will you vote for president? 

3. If you do not, for whom would you vote if yeu were 

going to vote? 
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4. Do you know who is ahead in the Literary Digest poll? 

5. Who is ahead in the Literary Digest poll? Don’t guess; 
if you do not know, say so. 

After the answers to these questions were collected, it was 
explained to the subjects that the experimenter was making a 
survey to discover the number of people who followed polls of 
presidential preference. The per cent of votes held by Lan- 
don and by Roosevelt in the Literary Digest poll was then put 
on the blackboard ‘‘in case’’ the students ‘‘were interested.”’ 
The experimenter also ‘‘incidentally’’ noted the fact that ‘‘the 
Literary Digest poll is the largest of its kind, with a total of 
over two million votes.”’ 

After a lapse of between three and four weeks the regular 
instructor in each section, not either of the writers, obtained 
a second vote by carrying through the following procedure: 

‘*We shall now collect data to be used later in the quarter. 

1. Put your name on the first line of the sheet. Under it 
write the name of the candidate for whom you would 
vote for president. 

2. Now, on the left side of the page, number from one to 
four. After (1) write the name of the candidate who 
is leading in the Literary Digest Poll. If you do not 
know who is leading, write D.K. (don’t know). 

3. After the next three pumbers write the names of any 
presidential polls with which you are familiar. After 
each poll you name, write the name of the candidate 
leading in that poll.’’ 

It will be noted that this experimental procedure differed 
from that used in many of the studies on the influence of ma- 
jority opinion in the important respect that the purpose of the 
investigation was concealed. The results of the poll were an- 
nounced as an incidental fact of possible interest. The second 
vote was collected, without forewarning, by a different experi- 
menter and under conditions so familiar to the subject that 
both opportunity and cause for associating the second vote- 
taking with the earlier announcement of poll results were 
greatly minimized. 
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The first analysis of the data gathered in the second experi- 
ment represents an attempt to find more critical evidence on 
the validity of the bandwagon theory than the usual evidence 
in the form of the gross changes in presidential preference 
after the publication of results of straw ballots. The failure 
of the publication of the results of the Literary Digest poll to 
reverse the trend of popular favor toward Roosevelt may be 
accepted as evidence that knowledge of these results was not 
a factor potent enough to nullify entirely the effect of the other 
factors which favored Roosevelt. However, this evidence is 
not critical in establishing that knowledge of these results had 
no influence. Perhaps the trend toward Roosevelt would have 
been more pronounced if there had been no such poll. Per- 
haps most of the influential factors were shifting voters from 
Landon to Roosevelt, so that the fewer voters drawn from 
Roosevelt to Landon as the result of the poll were not numer- 
ous enough to offset the major movement. If, however, it 
could be established that knowledge of results of the poll was 
an important factor among those voters who did shift to Lan- 
don, we might conclude that the publication of the results of 
the poll did have some effect on presidential preference. 

Group B of Table 2 is composed of subjects who voted for 
Roosevelt on the first vote and who already knew the results 
of the Literary Digest poll at the time of the first ballot. Of 
these subjects, 1.75 per cent voted for Landon on the second 
ballot. Initial familiarity with the results of the poll cannot 
be credited for this shift because the subjects were familiar 
with the poll before their original votes for Roosevelt. The 
members of Group A, on the other hand, were unacquainted 
with the poll when they voted for Roosevelt originally. After 
they had been exposed to the results of the poll at least once, 
5 per cent shifted to Landon. A statistically significant dif- 
ference between these two percentages might be suggestive of 
the influence of knowledge of the results of the poll in affect- 
ing the shifts from Roosevelt to Landon that did oecur.. Obvi- 
ously, however, the difference between the two percentages of 
Table 2 might be a chance difference. 











EFFECT OF POLLS OF PUBLIC OPINION 449 


TABLE 2 


Per Cent of 117 Subjects Changing Vote on Second Ballot ; Grouped 
According to First Vote and Knowledge of Poll Results 





Per cent changing 





to Landon after Diff 
Group N __ special exposure to Diff. : 
results of ‘‘ Liter- Oaire. 


ary Digest’’ poll 





A. Roosevelt voters un- 
acquainted with 
Literary Digest 
poll before first 
Naess 60 5.0 


B. Roosevelt voters 3.25% 1.02 
acquainted with 
Literary Digest 
poll before first 
EER a dl ERS 57 1.75 





Can this failure to obtain evidence favorable to the band- 
wagon theory be reconciled with the positive results obtained 
in Experiment 1? Alternative interpretations are possible. 
First, perhaps the relatively permanent effects of the announce- 
ment are less striking than the shifts immediately after ex- 
posure of the whole group to the announcement. Second, 
perhaps the true influence of the poll is positive but slight. 
Then a more sensitive technique, either in terms of the size 
of the sample or of the measuring instrument used, would 
establish the influence of the poll. 

In a somewhat similar experiment, Whisler and Remmers 
(11), using a Thurstone-type scale of attitudes toward any 
social institution, measured attitudes toward the two major 
political parties. The factor intervening between the test and 
the retest was the most authentic statement of majority opin- 
ion: the results of the actual election. Between the test imme- 
diately preceding the presidential election and the retest three 
weeks after the election the mean rating of 166 high school 
students shifted slightly toward the favorable end of the Demo- 
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cratic scale and in the unfavorable direction on the Republican 
scale. The critical ratio in the latter case exceeded the .05 
level of significance. In another study Thomsen (8) used the 
Beyle Thurstone-type scale to measure favorableness of atti- 
tude toward Landon and Roosevelt before the election, shortly 
after the election, and two weeks after the election. The data 
presented make it possible to calculate the mean scale-position 
for each application of the scale. Although the direction of 
the changes confirms the results of Whisler and Remmers, the 
magnitudes of the changes are too small to preclude an explana- 
tion on the basis of chance factors alone. 

Evidently, the common trend of these studies, including the 
present investigation, is to support the existence of a slight 
bandwagon effect, its influence being positive but relatively 
weak. Consequently, it appears that future studies to test 
the bandwagon theory by the practical method of the straw 
ballot must be prepared to use fairly large samples. 

The second analysis of the second experiment was based 
upon the following reasoning: if a given poll is an important 
factor in influencing preference the candidate reported as lead- 
ing should be preferred to a greater extent among people 
acquainted with the poll-than among those who do not know 
its results. Table 3 shows the differences in preference when 
the subjects are classified according to their knowledge of the 
results of the different polls. One hundred forty-six of the 
subjects (Group A) knew only the results of the Literary 
Digest poll while the remaining 110 (Group B) knew either of 
polls favoring both candidates, of a poll favoring Roosevelt, 
or of no polls. Of the former group, 50 per cent favored Lan- 
don as against 30.91 per cent of the latter; the difference, 
19.09 per cent, has a critical ratio of 3.16, thus exceeding the 
.01 significance level. 

As seen in the table, approximately the same magnitude of 
difference exists between Group A and each of two sub- 
divisions of Group B: (a) subjects who knew the results of 
polls predicting the election of both candidates, and (b) sub- 
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TABLE 3 


Per Cent of 256 Subjects Favoring Landon; Grouped According to 
Knowledge of Results of Various Polls of 
Presidential Preference* 











n - ‘ Diff. 
Group w * pn Diff. en 
A. Subjects knowing 
only the Literary 
Digest Poll occccccne 146 50.00 
B. All subjects not in 
TED soistenterninte 110 30.91 A-B=19.09% 3.16 
a. Subjects knowing 
polls favoring 
both candidates 65 32.31 A-a =17.69% 2.49 
b. Subjects knowing 
no poll results...... 42 30.95 A-b =19.05% 2.31 


ce. Subjects knowing 
only polls favor- 
ing Roosevelt ..... 3 00.00 





*The high percentage of the subjects who were familiar with the 
results of the Literary Digest poll at the time of the second vote is not 
a true indication of the percentage for students in general because the 
subjects had been given an announcement of the results of the poll after 
their first vote as part of the procedure. 


jects who were ignorant of the results of all polls. Although, 
due to the smalier frequency, the probability that the differ- 
ence is a true one is decreased in both these cases, it still 
exceeds the .05 level of significance. 

While, in view of these facts, we may be reasonably sure of 
an association between poll knowledge and presidential pref- 
erence, we are still at a loss as to ‘‘which came first.’” Knowl- 
edge of the Literary Digest poll may have influenced presiden- 
tial preference; on the other hand, preferring Landon may 
have led to a greater attention to and better memory of the 
favorable results of that poll. There is in addition, of course, 
a third possibility, namely, that the same factors which led to 
the difference in preference also, by allowing for an unequal 
exposure to the Literary Digest, led to the greater familiarity 
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with that magazine’s predictions. The principal objection to 
this is the fact that all local newspapers and radio stations 
carried weekly announcements of the poll as part of their 
campaign news. 

An argument in favor of the interpretation that knowledge 
of the poll influenced preference might be proposed if the fol- 
lowing assumptions could be accepted : first, that the 30.95 per 
cent voting for Landon in section ‘‘b’’ of Group B is an accu- 
rate estimate of the percentage who would have favored Lan- 
don if there had been no polls; second, that the 32.31 per cent 
is an accurate estimate of the percentage who would have 
favored Landon if everyone had been exposed equally to polls 
favoring both candidates; and, third, that subjects favoring 
Landon are no more likely to remember the results of polls 
showing Landon in the lead than subjects favoring Roosevelt. 
Granted these assumptions, one might argue along the follow- 
ing lines : subjects knowing polls favoring both candidates have 
about the same proportion favoring Landon as subjects know- 
ing no polls; i.e., the effects of the polls cancel each other. 
Moreover, subjects knowing only the Digest poll have, as a 
group, a stronger preference for Landon than the subjects of 
either of the other two groups. Therefore, knowledge of ma- 
jority opinion communicated in the form of the results of 
presidential polls influences presidential preference. 

Logical as this argument may appear, it cannot be accepted 
in the light of the doubt that is thrown upon the third assump- 


TABLE 4 


Per Cent of 108 Subjects Not Knowing Results of ‘‘ Literary Digest’’ 
Poll on First Vote Who Knew the Poll on Second Vote; 
Grouped According to First Vote 











Candidate favored N Per cent knowing poll Diff Diff. 
on original vote results on second vote : Oaite. 
Roosevelt. ........... 60 58.0 
17.00% 1.89 
Landon o...cccccoonn 48 75.0 
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tion by the analysis reported in Table 4. Of the subjects who 
were available at the retest, 108 had been unfamiliar with the 
results of the Literary Digest poll at the time of the first vote. 
After the first vote, as explained previously, all of these sub- 
jects had been given the results; and no doubt many of them 
had been exposed to the results from other sources. The third 
assumption demands that voting for Roosevelt or for Landon 
on the first vote is independent of memory for the results of 
the poll. The results presented in Table 4, however, suggested 
that Landon supporters were more likely to remember the re- 
sults than Roosevelt voters. Since the critical ratio falls 
slightly short of the .05 level of significance, the results of this 
comparison do not invalidate the assumption with certainty. 
Nevertheless, the direction of the difference is in line with the 
preponderance of the evidence from studies of the present 
decade. These, as summarized by Gilbert (3) show a ten- 
dency for pleasant material to have a greater memory value, 
when tested by delayed recall, than does either neutral or un- 
pleasant material. Consequently, the assumption that prefer- 
ence for Landon does not lead to better memory for the results 
of the poll cannot be accepted. 


SUMMARY AND CONCLUSIONS 


This study reports two experiments of an exploratory nature 
designed to measure the effect of polls of presidential prefer- 
ence upon individual belief. The subjects were 955 college 
students. Change in belief was measured by straw ballot. 

In the first experiment the presidential preference of a con- 
trol group was compared with that of an experimental group 
that had been given the results of the Literary Digest poll 
immediately before the straw ballot. The difference was in the 
direction of the candidate favored by the poll and was at the 
.05 significance level. 

In the second experiment an attempt was made to determine 
whether or not the poll might have been an important factor 
in influencing the preference of those voters who did shift 
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from Roosevelt to Landon between the first and second ballots. 
The results here were inconclusive. 

In a further analysis of the second experiment an associa- 
tion between knowledge of poll results and presidential pref- 
erence was established; but the possibility that the common 
factor was a tendency to remember better the results of favor- 
able polls could not be refuted. 


10. 
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E dissatisfaction and maladjustment of students with 
respect to the educational institution in which they are 
enrolled presents a perplexing situation both to the ad- 

ministrator and to the class-room instructor. A first-step 
toward the problem’s solution consists of the recognition of 
attitudes which contribute to good and poor judgment and of 
the valid detection of those attitudes in individual personali- 
ties. Recently a questionnaire, which sought ‘‘to describe 
quantitatively the attitude of pupils toward their school,’” 
was compiled by Hugh M. Bell and administered to several 
hundred high school students. 

The School Inventory embraced seventy-six items purport- 
ing to evoke responses indicative of various school relation- 
ships. A testee might answer any single question by encircling 
either ‘‘ Yes,’’ ‘‘No,’’ or ‘‘?’’ printed immediately before the 
item. Scoring was accomplished by tabulating the number of 
disagreements between the responses of a subject and those 
of the ‘‘key.’’ According to Bell, ‘‘Students who make low 
seores tend to be well adapted to the school environment, they 
like their teachers, enjoy their fellow-students, and feel that 
the school is conducted systematically and fairly. Students 
who make high scores tend to be poorly adapted to the school : 
they dislike their teachers, think that the principal is unfair 

1 Bell, H. M. Manual for the School Inventory. Stanford Univ., 
Calif.: Stanford University Press, 1937, p. 1. 
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with students, and sometimes express a desire to withdraw 
from school.’’* Reliability, determined by the correlation of 
odd-even items, was represented by a coefficient (Spearman- 
Brown correction) of 0.94 + 0.004. The Inventory was vali- 
dated, first, through the use of item analysis techniques, and 
second, by the comparison of scores of selected well-adjusted 
and poorly adjusted high school students. 

The present discussion reports the administration of the 
School Inventory to a group of girls in a women’s college and 
the subsequent analysis of results obtained. Some items were 
obviously not applicable to the college situation, and others 
presumed a coeducational set up, but these eliminated them- 
selves in the statistical selection of questions which differen- 
tiated between groups scoring high and low on the test as a 
whole. Twenty-three items were chosen which appeared to be 
valid indicators of adjustment to school factors in a women’s 
college. 

PROCEDURE 


The School Adjustment Inventory was administered to 158 
first-year students of William Woods College at the end of 
their first semester in attendance. Administration was de- 
layed until this time in order that the students might have 
sufficient opportunity to grow accustomed to the new environ- 
ment. The inventory was scored by application of the sup- 
plied key and raw scores were translated into standard scores 
having a mean of 100 and a unit value of one-tenth of one 
standard deviation. 

Item analysis was accomplished by the selection of those 
individuals scoring among the highest twenty-five per cent 
and the lowest twenty-five per cent of the sample. The ‘‘ Yes”’ 
responses of these groups were recorded and the differentiation 
value of each item, in terms of the standard error of the differ- 
ence, was determined. Of the seventy-six items in the original 
inventory, twenty-three were found to be discriminating when 
the arbitrarily-set criterion p; — p2/S.D.»: - p2 = 3.5 was applied. 

2 [bid, p. 1. 
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These items which appeared to be significant in estimating the 
adjustment of students in a women’s college are listed in 
Table 1. 


TABLE 1 


Twenty-three Discriminating Items Selected from the Bell School Adjust- 
ment Inventory upon Analysis of the Responses of 158 


First-Year Students in a Women’s College 
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Do you like all of the subjects you are now taking in this 
school? 

Do you think that all of your teachers are ‘‘up to date’’ 
in their ideas and actions? 

If you were able to do so, would you like to attend some 
other school than the one you are now attending? 

Do you find that some of your teachers are very hard to 
get acquainted with? 

Do you think that some of your teachers feel that they 
are superior to their students? 

Do some of your teachers ‘‘talk over the heads’’ of their 
students? 

Are some of your courses very boring to you? 

Are some of your teachers very sarcastic? 

Do you have difficulty in keeping your mind on what you 
are studying? 

Do you think that some of your teachers expect too much 
of you? 

Have you experienced considerable difficulty preparing 
your lessons for your classes? 

Do you think that this school is run as if it were a prison? 

Have you been able to choose the subjects you like in this 
school? 

Do you think that some of your teachers show partiality 
toward certain students? 

Do you think that your teachers require too much work to 
be done outside the regular class period? 

Do you think that some of your teachers treat you as if 
you were a small child? 

Do you feel that most of your teachers have confidence in 
your ability to succeed? 

Do you find that some of your teachers make you feel as 
if you did not care whether you learned anything in 
their classes or not? 
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TABLE 1—(Continued) 





19. Y N # Lo you find that all of the teachers in this school are 
cheerful and pleasant to meet? 

20. Y N # Do you find that some of your classes are very monoto- 
nous? 

21. Y N # Do you find that some of your teachers fail to stimulate 
in you the desire to do your best work? 

22. Y N # Do you find that some of your teachers apparently take 
delight in making you feel embarrassed before the 
class? 

23. %Y N # Do you think that some of your teachers show a lack of 


interest in school activities? 





Table 2 shows the difference between the responses of the two 
groups divided by the standard error of the differences for 
each of the twenty-three items and portrays the relative valid- 
ity of each selected question. 

Following the item analysis of the School Inventory and 
subsequent selection of discriminating items, the responses of 
the entire group of 158 girls were rescored with respect to the 
twenty-three questions comprising the revised inventory. The 
revised adjustment scores were calculated in a manner which 
differed from that originally employed by Bell. They were 


determined in terms of percentage right by applying the for- 





mula 30’ where c equals the number answered according to 


the key, and Q equals the number of items for which ‘‘ ?’’ was 
encircled. The revised scores ranged from 12 to 100, with a 
Mean of 72.7 and a standard deviation of 20.3. Table 3 gives 
the scores obtained on the twenty-three item inventory and the 
comparable standard scores based upon 158 cases. 

The reliability of the School Inventory revised for use with 
first year students in a women’s college was determined by 
both the odd-even item and test-retest techniques. When the 
odd numbered items were correlated with the even an uncor- 
rected product moment coefficient of 0.62 + 0.03 was obtained. 
Correction through application of the Spearman-Brown for- 

















STUDENT ADJUSTMENT 459 


TABLE 2 


Differentiation Values* of School Adjustment Items Selected with Respect 
to the Responses of First-Year Students in a Women’s College 








Item number Pi — Po/S.D. 91-0 
1 5.0 
2 4.5 
3 3.9 
4 4.6 
5 3.5 
6 3.7 
7 4.6 
8 6.0 
9 3.5 

10 4.4 
11 4.0 
12 4.7 
13 4.0 
14 3.5 
15 4.5 
16 3.5 
17 3.7 
18 4.5 
19 3.8 
20 4.4 
21 5.0 
22 4.0 
23 3.7 





* Determined by calculation of the standard error of the difference in 
percentage of items answered according to the key by those individuals in 
the upper one-fourth and those in the lower one-fourth of the ‘‘total 
score’’ distribution. 


mula raised the reliability coefficient to 0.75 + .02. One month 
following the first administration the revised adjustment 
schedule was again presented to the subjects. The test-retest 
reliability as indicated by a product moment correlation coeffi- 
cient was 0.73 + .02. 


COMPARISONS OF ORIGINAL INVENTORY AND THE REVISED 
INVENTORY FOR STUDENTS OF WILLIAM WOODS COLLEGE 
It would appear that the selected items would provide a 
more valid basis for the judgment of adjustment status of 
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TABLE 3 


Raw Scores* and Comparable Standard Scores of the Twenty-Three 
School Adjustment Items Selected with Respect to the Responses 
of First-Year Students in a Women’s College 





Raw Standard Raw Standard Raw Standard Raw Standard 





score score score score score score score score 
100 114 75 101 50 89 25 76 
99 113 74 101 49 88 24 76 
98 113 73 100 48 88 23 75 
97 112 72 100 47 87 22 75 
96 112 71 99 46 87 21 74 
95 111 70 99 45 86 20 74 
94 111 69 98 44 86 19 73 
93 110 68 98 43 85 18 73 
92 110 67 97 42 85 17 72 
91 109 66 97 41 84 16 72 
90 109 65 96 40 84 15 71 
89 108 64 96 39 83 14 71 
88 108 63 95 38 83 13 70 
87 107 62 95 37 82 12 70 
86 107 61 94 36 82 | ae ee 
85 106 60 94 35 81 Sie 
84 106 59 93 34 81 ee 
83 105 58 93 33 80 _ 
82 105 57 92 32 80 7 
81 104 56 92 31 79 ies Sea 
80 104 55 91 30 79 aa 
79 103 54 91 29 78 it 
78 103 53 90 28 78 a 
77 102 52 90 27 77 _ a 
76 102 51 89 26 77 ok Vand 





* Raw scores for the twenty-three-item scale were determined by apply- 
ing the formula nwmber correct/23—number marked ‘‘Q,’’ which yielded 
the score as a percentage. 


first year students in a women’s college than would the original 
inventory prepared for high school pupils of both sexes. In 
that the items of the revised inventory also were part of the 
original, it was not surprising that the variance in score on the 
two was not extreme. The Mean of the School Inventory was 
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12.4, §.D., 8.1, as compared to the Mean of the revision of 72.7, 
S.D., 20.3. The coefficient of skewedness for the former was 
found to be — 0.45 and for the latter — 0.43. Scores on the two 
forms for 158 students seemed to be related to the extent 
indicated by a product-moment correlation coefficient of 
0.87 + .012. 

In Table 4 the distribution of the differences between stand- 
ard scores on the original and the revised inventories is given. 
The Mean difference was found to be 3.56, S.D., 2.64. The 
Mean relative change was + 0.68, indicating a slightly higher 
average adjustment score on the revised list of questions. 


TABLE 4 


Frequency Distribution of Differences between Standard Scores* Based 
upon the Original Inventory and Standard Scores Based upon the 
Twenty-Three Selected Items 





Difference in 





standard scores Frequency 
15 1 
14 1 
13 0 
12 0 
1l 2 
10 1 
9 3 
8 1 
7 2 
6 11 
5 19 
t 12 
3 24 
2 32 
1 38 
0 1l 
N=158 
Mean = 3.56 
8S. D. = 2.64 





* Raw scores of each series were translated into standard scores with a 
Mean of 100 and a standard deviation of 10; unit value of the standard 
score equals one-tenth of a standard deviation. 
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SUMMARY 


The Bell’s School Adjustment Inventory, purporting to indi- 
cate school adjustment, was administered to 158 first-year stu- 
dents in a women’s college. The results were studied with 
regard to the differentiation value of the individual items, and 
twenty-three significantly discriminating items were selected 
from the original list of seventy-six. It is suggested that 
scores based upon the twenty-three chosen items might serve 
more successfully to indicate the adjustment of students in a 
women’s college to the institution in which they are enrolled. 











PRELIMINARY STUDY OF AN INDUSTRIAL 
REVISION OF THE REVISED MINNESOTA 
PAPER FORM BOARD TEST 


RICHARD 8. SCHULTZ 
Psychological Corporation 


HE Revised Minnesota Paper Form Board Test (Likert- 
Quasha) has been used for several years as one instru- 
ment in an experimental battery of tests for machinists, 

mechanics, and engineers. In the industrial set-up certain 
practical difficulties arose in administration of the test. The 
printed instructions are too detailed for individuals with supe- 
rior ability and too involved for those with average or low 
ability. This limitation affects the usefulness of this test as a 
measure of ‘‘ Visual-Spatial’’ ability and also leads to problems 
in maintaining the cooperation of men being tested. 

The fact that it is necessary to write a letter (A, B, C, D, E) 
is confusing to some subjects. To industrial employees with 
little schooling and infrequent opportunity to write, this is an 
unnatural task and may set up an unfavorable attitude toward 
the test. 

Tests of special abilities for industrial use must be designed 
to measure the specific ability rather than verbal comprehen- 
sion. Instructions must be simple. Neither must they neces- 
sitate comprehension of difficult verbal material nor the per- 
formance of a complicated task which is irrelevant to the ability 
being tested. 

An Industrial Revision of this test was made based on the 
items in Form AA (Revised Minnesota Paper Form Board 
Test). Figure 1 (the first page of the Industrial Revision) 
illustrates the form of the test items, the minimum demand for 
verbal instructions, and the simple response required.* 


1Thurstone used a similar form with the original Minnesota Paper 
Form Board, but did not include the multiple choice response. L. L. 
Thurstone, ‘‘ Primary Mental Abilities,’’ Psychomet. Monog., No. 1, 1938, 
pp. 35-36. 


463 














464 RICHARD S. SCHULTZ 








Name Dete 
READ DIRECTIONS CAREFULLY 
PUT A CROSS OVER THE ONE FIGURE IN 


WHICH THE BLACK PIECES WILL FIT EXACTLY. 
Begin Here 


i A da Ok 
EE | O 
9; DRE@OOO 













































































Begin Here 


@| DPCoeoso 
oO; DYODDO 


= * 
4 DODO Dd Dp 


Fig. 1, 














RESULTS 


The data presented here cover a number of the principal 
factors which must be considered in evaluating the Industrial 
Revision of the Paper Form Board. While the results are 
exploratory, they reveal directions for further study to estab- 
lish the validity and reliability of the suggested revision. 

1. Comparison of Revised Minnesota Paper Form Board and 
Industrial Revision. Table 1 gives the distribution of scores 
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for each of the test forms.? These scores were made by two 
comparable groups of industrial workers. Men in the two 
groups represent a wide range of experience, education and 
ability for machine and mechanical work. Table 1 indicates 
that scores on the Industrial Revision are consistently higher 
than scores obtained with the Minnesota revision. Analysis 
of the distribution shows that only one case falls below a score 
of 23 on the Industrial Revision, while 16 cases are below this 
point on the Revised Minnesota. At the upper end of the dis- 
tributions, 30 cases are found with a score of 43 or above on 
the Industrial Revision, as compared with 19 cases on the 
Revised Minnesota. 
TABLE 1 


Scores on Revised Minnesota Paper Form Board and 
Industrial Revision 








Score nal = 
59-62 1 3 
55-58 0 2 
51-54 5 2 
47-50 6 10 
43-46 7 13 
39-42 27 26 
35-38 29 23 
31-34 30 28 
27-30 22 8 
23-26 11 a 
19-22 10 0 
15-18 3 1 
11-14 3 0 
i iniinsodiaal 145 125 

oe 34.3 38.2 
IS setisence iain 8.6 7.8 

Diff. mean ................... 4.1 

es SE cestpicneie 1.0 

Critical ratio ............... 4.0 





2 The time limit used for both forms was 14 minutes. 
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Table 2 shows the relationship between the two forms when 
the tests were given in alternate order to several groups. The 
correlations indicate as close a relationship between the two 
forms as was found between the original Minnesota Paper 
Form Board and the Likert-Quasha revision, where the lowest 
correlation was .75.° 


TABLE 2 


Correlation between Revised Minnesota Paper Form Board 
and Industrial Revision 








Group oe Test order Time 
Engineering students ..... 21 .711 Industrial 
Revision first 14 min. both forms 
High school girls ................ 57 .769 Industrial 14 min. Indus. Rev. 
Revision first 20 min. Minn. rev. 
High school girls ............... 87 .859 Minnesota 20 min. Minn. rev. 
Revision first 14 min. Indus. Rev. 





2. Comparison of Scores on Industrial Revision Among 
Different Groups. Table 3 gives average scores on the two 
forms for four groups. The average for high school girls is 
lowest while that for engineering students is highest. This 
tends to support the contention that the Paper Form Board 
test is an index of ‘‘Visual-Spatial’’ abilities essential for 
machine and mechanical work. 


TABLE 3 


Comparison of Scores of Different Groups, Industrial Revision 
Taken First, Time Limit 14 Minutes 








Group N Mean 8.D. 
Engineering students .................. 21 39.2 8.3 
Industrial Workers .....c:ccccccceco 125 38.2 7.8 
Trade School DOYS ..c1ccccccccocrencenon 42 32.3 7.1 
High school gird .0.....cccccccoccon 57 32.5 9.5 





3 W. H. Quasha and R. Likert, ‘‘The Revised Minnesota Paper Form 
Board Test,’’ J. Educ. Psychol., March, 1937, 197-204. 

















PAPER FORM BOARD TEST 467 


























3. Correlations of Scores on Industrial Revision with Gen- 
eral Intelligence. Previous studies have cited correlations 
between the Revised Minnesota Paper Form Board and intel- 
ligence ranging from +.28 to + .63. The correlation between 
the Industrial Revision and intelligence* for a group of 55 high 
school girls is + .65, and for a group of 26 trade school students 
is + .42. 

GENERAL SUMMARY 


A preliminary study was made of an Industrial Revision of 
the Likert-Quasha Revised Minnesota Paper Form Board Test. 
The results indicate that: 

1. Seores on the Industrial Revision tend to be significantly 
higher than on the Revised Minnesota when a 14-minute time 
limit is used for each. 

2. The Industrial Revision seems to be a valid substitute for 
the Revised Minnesota. Engineering students obtain a higher 
average score than trade school boys and high school girls who 
would be expected to score lower. 

3. The Revised Minnesota correlates from +.711 to + .859 
with the Industrial Revision. Correlations with intelligence 
correspond with those found in previous studies. 

4. Experimental use of the Industrial Revision of the Paper 
Form Board Test as part of an employment procedure indi- 
cates the desirability of further study of this form. 


4 The intelligence test used with the high school girls was the Revised 
Alpha Examination Form 8 and with trade school students was the 
Pressey Senior Classification Test. 

















VISUAL RESPONSES TO AUDITORY STIMULI 


LOUISE OMWAKE 
Centenary Junior College 


TUDIES of primitive music indicate that certain instru- 
S ments and appropriate rhythms were associated with par- 
ticular emotions in early ceremonials. A conditioning 
process, probably facilitated by a natural response to sound, 
resulted in the use of definite tonal stimuli for various religious 
rites. As man’s means of producing music increased and his 
interests broadened to include more complex institutions and 
corresponding emotional reactions, the use of sound to evoke 
a response grew in importance. Today, music is one of 
propaganda’s most potent weapons; familiar music welds a 
heterogeneous crowd into a cooperative unit; church choirs 
lure indifferent persons to devotional services ; music augments 
the excitement and depression depicted in the movies; radio 
advertising pleasantly announces itself by a theme song; lulla- 
bies still soothe infants in spite of the child psychologist’s 
injunctions to the contrary. 

Slightly less important to the average person but neverthe- 
less significant, are the emotional reactions aroused by lines 
and colors in the visual field. Artists have long recognized 
the quiet, soothing effect produced by horizontal lines and 
shades of blue and green. We are the victims of our feelings 
which colorful advertising, dress materials, and soft lights 
produce in us. Through intentional human design and uncon- 
trollable nature, sound and vision prevent our being the crea- 
tures of reason which so many of our species claim for us. 


1 The writer wishes to express her appreciation to Miss Betty Backes 
for the time and suggestions which she volunteered during the experi- 
mental work. 
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Granted that both visual and auditory stimuli evoke feelings, 
it is reasonable to suspect that the two senses may be related 
through a common emotion. A funeral march may suggest 
black, and band music may bring to mind red or blue. 

The problem of synaesthesia presents another question. Not 
infrequently a sound or a word is closely linked with a color; 
an odor may even have its counterpart in light or sound waves 
while a connecting emotion is not evident. The person who 
experiences synaesthesia may not be able to analyze nor explain 
his associations, so divorced are they sometimes from reason 
and conscious development. The explanation may lie in the 
conditioning mechanism which probably forms the basis for 
many visual-auditory relationships. Conditioning provides a 
reasonable basis for associating red with the siren whistle of a 
fire engine. 

It was the purpose of the present study to investigate some 
elementary principles involved in auditory-visual relationships 
and to determine the genetic development of the associations. 
The materials were necessarily simple and the method limited 
to preserve objectivity, and to assure comprehension of direc- 
tions by children as young as nine years of age. The same 
testing procedure was observed throughout the grade range 
from the fourth grade through the last year of high school. 
The five hundred and fifty-five subjects represent a public 
school in a small eastern town. 


PROCEDURE 


Each subject was given a sheet of paper like the following 
on which to record his responses. 


PIANO 
Notes and Plain Colors No. First Record No. 

First note akateeai Colors ‘ 
I ke a a ae ee : 
Rg kr sr sree 
>|  eaemenents Men 

oo? To ae 

I 


LL ee 
Eighth note n= aS 
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Chords and Plain Colors Second Record 
ne | I CE a NCEE 
Second chord oh ER. ety ee 
Tere ONE = nn ETE eS eg ee 
yaaa: rma se SRR a ters seat 
Py Se eons 
OS ee ees 


Sova Gere tit... 
Eighth chord 


Chords and Pairs of Colors Third Record 
Pe >)” ee eS Colors 
eS a ‘ RTS RE ay Seep ene 
as hill ai me are A A 
Wemren Ge ees | 8 -SY )  aE PRe ea 
Reon 
a 


Seventh chord ———erreccescccsssne 
pe ne 


Series of Notes and Figures Fourth Record 
Mi a a Sara anette 
Second series — ——irevevesrssnsnen SS yee ae 
ae Pictures 
CR eee earn aS Ta a ae es 
ee 
NS Ge Se a 


Seventh series  errecessssmnn 
, 








The following directions designed for high school students 
were simplified, with frequent repetitions for children of the 
lower grades. Excellent cooperation was received and a spon- 
taneous interest was evoked by the unusual testing materials. 


DIRECTIONS 

We are asking for your cooperation in a new and rather 
unusual study. It will in no way be a test of your knowledge 
or intelligence, and your achievement will not reflect any of 
the generally valued personality traits. Instead, we are try- 
ing to find out if some people, and how many people tend to 
associate sound and visual objects, for example, whether low 
notes and high notes suggest different colors, whether certain 
musical selections bring to mind visual images, etc. 

We are commencing with rather elementary materials be- 
cause the study will include the responses of young children 
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as well as adults. The test may seem strange to you, but 
please give it your serious consideration. Please do not com- 
pare your responses with those of your neighbor until after the 
test is finished. 

I am going to play a note on the piano and show you four 
colors. Opposite ‘‘First Note’’ under ‘‘Notes and Plain 
Colors’’ in the first column (pause) ... write the number of 
the color which seems to go with the note (which the note 
makes you feel like, of which the note reminds you). Write 
only the number in the space provided for it. 

With appropriate substitutions these directions were used 
throughout the test. 


PART I 
Series of Notes and Figures 


The test of notes in a series and graphic presentations were 
included to determine the lowest level at which visual and 
auditory associations were made. The figures, mounted on 
cardboard, consisted of orange lines in step formation on black 
backgrounds, with a rise and fall in the pictorial pattern corre- 
sponding with the auditory rise and fall of notes in a series. 
A word picture of the figures follows: 

1. a straight, horizontal line 
2. steps from lower left to upper right 
3. steps from upper left to lower right 
4. a succession of up and down steps 
The eight series of notes were in the middle range of the piano 
keyboard : 
. C, D, E, F, E, D, C 
. A repeated eight times 
. G, A, B, C, D, E, F 
. C repeated eight times 


OID PP ww 
2 
oO 
B 
ae 
2 
& 
2 
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The subjects were asked to ‘‘ write the number of the picture 
that seems to go with the notes,’’ etc. The number of cases 
upon which the percentages were based remained relatively 
constant throughout all sections of the experiment. 

The summary of the results by grades is given in Table 1. 


TABLE 1 
Series of Notes and Figures 





« Figure and Largest Per Cent Associating It with Each Series 
° of Notes 

ss 

Zo 





1 2 3 4 5 6 7 8 





#4- 69 #1-76 4-54 H1-74 2-64 4-60 H4- 52 3-70 
52 87 “84 #285 “84 “96 #3-95 98 “ 84 
50 “ 92 “76 “50 “80 “88 “48 “ 82 “ 80 
52“ 98 “91 “60 “98 “91 “61 “ 81 “ 84 
39 “ 86 “93 #3-82 “87 “90 “87 “ 98 “80 

1066 “ 86 “88 #2-87 “83 “77 “88 “ 97 “67 
10 78 “ 94 “89 “89 “92 “85 “92 “ 98 “ 75 
11 80 “ 97 “93 “93 “95 “93 “92 “ 96 «93 
12 71 “100 “93 “89 “97 “93 “97 “100 “93 


ooxraae!| Grade 
La] 
~] 





The writer is unable to explain the deviations in the case of 
the third series, and again for number six, but the very large 
per cents in all other cases suggest high agreement and an in- 
crease with age level. However, even the nine- and ten-year- 
old child readily recognizes that ‘‘up’’ in sound corresponds 
with ‘‘up’’ in space. 


Notes and Plain Colors 


The following eight notes were played on the piano in the 
order given: 


—second E above middle C 
—third G below middle C 

sharp—below middle C 
—second C below middle C 
—below middle C 


See 
QQgea 
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6. C sharp—second C sharp above middle C 
7. D —second D above middle C 
8. A —second A above middle C 


With each note, four colors—red, blue, black, yellow—were 
presented on a large card with numbers to designate each. 
The colors were those sold by the Hammett Company for kin- 
dergarten materials. As each note was sounded, the subject 
wrote on the appropriate line the number of the color which 
was suggested by the sound. 

The author is aware that the order of presentation of the 
notes may have affected the quality of the response by contrast 
but the degree of consistency was, nevertheless, interesting. 
Table 2 gives the per cent of students in all nine grades who 
responded to Note No. 1 with each color. The divergence and 
agreement indicated were typical of all of the tests, but only 
the largest per cents will be presented in subsequent tables 
for economy of space. 


TABLE 2 
Note No. 1 (Second E above Middle C) and Plain Colors 





Per Cent of Students Associating Each Color with Note 
No.1 





Grade 4 5 6 7 8 9 10 11 12 





a 43 37 42 44 41 40 48 48 63 
ee 29 21 8 10 23 1l 21 10 8 
| 11 2 10 6 4 1 2 1 
Yellow ............ 29 31 48 36 30 45 30 40 28 





Summarized, the color designated the largest per cent of 
times by students of each grade was: 


Grade 4—red 43% Grade 9—yellow 45% 
Grade 5—red 37% Grade 10—red 48% 
Grade 6—yellow 48% Grade 1l—red 48% 
Grade 7—red 44% Grade 12—red 63% 


Grade 8—red 41% 
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Some factor seemed to operate to produce a consistency of 
response definitely better than chance. Since this test was 
the first of this series to be presented, the associations were 
not given in contrast to a preceding reaction, which factor 
might well affect subsequent responses. Nevertheless, red was 
the most frequent response to the second E above middle C 
on the piano and the per cents were well above the chance 
expectancy of twenty-five per cent for each color. The only 
color which was confused with red was yellow which stands 
next to it in the color spectrum. Since the note is above the 
middle of the keyboard range, one may conclude that some 
factor operated either by conditioning or innately to cause the 
association of red and occasionally yellow with that location 
on the keyboard, but very rarely was black suggested by this 
E note. 

Table 3 shows the color response most frequently indicated 
by students of the nine grades as each note was sounded. 

The tendency to associate a certain color with a piano note 
was definitely greater than chance, and the agreement of re- 
sponse increased with the age of the subjects. There was a 
slightly greater consistency of report in the sixth and follow- 
ing grades than at the lower levels, although even the responses 
of nine and ten year old children showed a kinship between 
color and sound reactions. Black was usually suggested by a 
low note, yellow by a high one, red by a relatively high one, 
and blue by a relatively low one. 


PART I 


After the visual associations with elementary sounds were 
determined, four unidentified victrola records were chosen for 
presentation with visual materials. The musical selections 
were intended to represent a soothing lullaby, a march tune, 
a slow melancholy number and a light dancing rhythm. Care 
was taken to exclude musical numbers with which one would 
readily associate pictures through previous conditioniag. The 
following selections were used : 





te) 
~ 
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1. ‘‘Shepherd Boy’’ by Wilson (lullaby) 

2. ‘‘First Brigade March’’ by Weldon (march) 

3. ‘‘Andante Cantabile’’ by Tschaikowsky (melancholy) 

4. ‘‘Sylvia Ballet’’ by Delibes (dance) 

In some cases the records were not played in their entirety 
but the sections most representative of the desired musical type 
were repeated. The directions followed the pattern which was 
used for the notes and chords: ‘‘ Write the number of the 
(color line, picture, man) which seems to go with the music 
(which the music makes you feel like, of which the music 
reminds you).’’ 

As a record was played the visual materials were presented 
on large cardboard charts, four numbered choices being offered 
simultaneously in each test. A description of the visual mate- 
rials follows: 

1. Colors: 

four squares of (1) red, (2) blue, (3) black, (4) yellow 

2. Lines: 

(1) regular waves 

(2) straight, broad, horizontal bar 

(3) regular succession of zigzag lines of equal magni- 
tude 

(4) irregular zigzag pattern formed with dots instead 
of lines 

3. Pictures: 

(1) a woodcut showing prostrate figures surrounding 
a falling man in the throes of death 

(2) several girls on ice skates executing difficult 
dance steps 

(3) workmen with sledge hammers pounding railroad 
ties 

(4) a young child sleeping on his father’s shoulder 

The pictures were approximately the same size and all 

were in black and white tones. 

4. Stick Men: 

(1) figure with arms and legs raised in dancing posi- 
tion 
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(2) figure with drooping head, shoulders and arms 
(3) reclining figure 
(4) figure in marching position 
To facilitate the reader’s recognition of each response in the 
following tables, the visual stimuli will be keyed as follows: 


Colors: Lines: Pictures: Stick Men: 
1. red 1. wavy 1. death 1. dancing 
2. blue 2. straight 2. workmen 2. grieving 
3. black 3. regular 3. skaters 3. sleeping 
4. yellow 4. irregular 4. child 4. marching 


The results as summarized below give the responses which 
the highest per cent of each class associated with the several 
records. 

TABLE 4 


Shepherd Boy (Lullaby) and Visual Stimuli 





Visual Stimuli Which the Largest Per Cent Associated with 








Grade “Shepherd Boy ’’ 

Colors Lines Pictures Stick Men 
4 ..... blue 44 irreg. 53 child 68 sleeping 69 
ew blue 52 irreg. 54 child 90 sleeping 81 
Pitch blue 44 irreg. 64 child 92 sleeping 68 
De Saat blue 69 wavy 50 child 94 sleeping 92 
eee blue 53 irreg. 48 child 90 sleeping 94 
_ oe blue 53 wavy 44 child 90 sleeping 90 
| blue 48 wavy 45 child 90 sleeping 81 
en blue 51 wavy 65 child 96 sleeping 97 
BD widen blue 76 wavy 77 child 99 sleeping 96 





The largest per cent of the responses of each grade level was 
given without exception to the color, blue; to the picture of a 
sleeping child ; to the stick man in a reclining position. There 
appeared a distinct differentiation between the lower and 
higher grades in the reaction to lines. The irregular line was 
chosen by the majority of the fourth, fifth, sixth and eighth 
grades while the older students selected the wavy line. Most 
of the per cents are far above the twenty-five per cent chance 
expectancy. 
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TABLE 5 
First Brigade March and Visual Stimuli 





Visual Stimuli Which the Largest Per Cent Associated with 








Grade “First Brigade March’’ 
Colors Lines Pictures Stick Men 

4 blue 39 irreg. 36 skaters 92 dancing 77 
5 red 54 irreg. 41 skaters 80 marching 50 

dancing 50 
6 red 52 irreg. 54 skaters 82 marching 50 

dancing 50 
7 red 69 irreg. 63 skaters 81 dancing 56 
8 red 78 irreg. 58 skaters 67 marching 51 
9 red 66 reg. 48 skaters 83 dancing 71 
10 red 65 reg. 51 skaters 82 marching 49 

dancing 49 
11 red 69 reg. 80 workmen 48 marching 80 
12 red 97 reg. 69 skaters 57 marching 82 





Red was most frequently associated with march time; regu- 
larity of line accompanied the regularity of rhythm in the 
minds of the more mature subjects; the picture of the skaters 
missed the unanimous vote of the nine grades by only one per 


TABLE 6 
Andante Cantabile (Melancholy) and Visual Stimuli 





Visual Stimuli Which the Largest Per Cent Associated with 








Grade “Andante Cantabile’’ 

Colors Lines Pictures Stick Men 
4 black 55 irreg. 42 death 71 grieving 63 
5 black 72 wavy 36 death 78 grieving 96 
6 black 84 wavy 46 death 74 grieving 80 
7 black 86 straight 53 death 92 grieving 94 
8 black 80 straight 66 death 71 grieving 90 
9 black 82 straight 59 death 70 grieving 89 
10 black 88 straight 61 death 84 grieving 84 
11 black 92 straight 84 death 98 grieving 94 
12 black 96 straight 96 death 100 grieving 98 
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cent in the eleventh grade; dancing and march music were 
confused by most of the subjects with only the older ones 
recognizing the true type of rhythm. 

The color of black, the picture of death, and the posture of 
grief were identified with ‘‘ Andante Cantabile’’ by the large 
majority of every grade. The broad, straight, horizontal line 
was agreed upon by all of the older students. The music of 
sorrow has its counterpart in line, color and idea. 


TABLE 7 
Sylvia Ballet (Pizzicato) and Visual Stimuli 





Visual Stimuli Which the Largest Per Cent Associated with 








Grade ‘*Sylvia Ballet ’’ 

Colors Lines Pictures Stick Men 
4 red 38 irreg. 38 skaters 50 marching 45 
5 red 40 irreg. 54 skaters 73 dancing 60 
6 yellow 38 irreg. 64 skaters 74 dancing 60 
7 blue 42 irreg. 81 skaters 62 dancing 54 
8 yellow 48 irreg. 51 skaters 66 dancing 53 
9 yellow 53 irreg. 71 skaters 66 marching 46 
10 yellow 41 irreg. 78 skaters 70 marching 50 
11 yellow 41 irreg. 95 skaters 91 dancing 73 
12 yellow 86 irreg. 83 skaters 88 dancing 88 





‘*Sylvia Ballet’’ suggests the irregular line and the picture 
of skaters with pronounced unanimity, and yellow is the color 
chosen by the more mature subjects. Again, confusion exists 
as to the type of rhythm—dance or march—and the closest 
agreement is found in the last two years of high school. 


CONCLUSIONS 


In an effort to discover whether school children responded 
to certain sounds with uniform visual sensations, and whether 
there was a growth in this trait with age, the author used 555 
school children representing grades four through twelve as 
subjects. The ‘‘Sound’’ material consisted of piano and vic- 
trola stimuli; the ‘‘Visual’’ material included colors, lines, 
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figures and pictures presented on large charts. The salient 
findings were as follows: 


Part I 


1. A very close relationship was found between the sound 
pattern of eight simple series of notes and four graphic figures 
showing the rise and fall of lines corresponding to similar 
changes in sound. 

2. There was a decided tendency to associate black with low 
notes, yellow with high ones, red with relatively high ones, 
and blue with relatively low ones. 


Part II 


Sections of four victrola records were presented to give 
music suggestive of a lullaby, a march, a dance and grief. 
Four colors, four lines, four pictures and four ‘‘stick men’’ 
were shown successively to the subjects as the record was 
played. 

1. ‘‘Shepherd Boy,’’ a lullaby, was associated with the color 
blue by the largest number in each class; a wavy line won in 
the upper grades, replacing the irregular line voted for by 
the younger children; the picture of a sleeping child was the 
unanimous first choice; a reclining ‘‘stick man’’ was the re- 
sponse of all grades. 

2. Red was named for the ‘‘First Brigade March’’ by all 
but the fourth grade; the regular zigzag line won the largest 
response in the grades of high school while the irregular dotted 
line was selected by the elementary school children; eight 
classes associated skaters with the march music while one 
group chose the workmen with sledgehammers; dancing and 
marching posture divided the votes for the stick men, march- 
ing being recognized as appropriate in the last years of high 
school. 

3. ‘‘ Andante Cantabile’’ evoked considerable agreement of 
response. The majority in every class chose to accompany the 
music with the color black, the picture of death, and the stick 
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man that was drooping with grief. The six highest grades 
gave the straight black bar as the appropriate line. 

4. The pizzicato ‘‘Sylvia Ballet’’ won the agreement of the 
majority in each class in lines—irregular dots, and pictures— 
ice skaters. The groups again confused the stick men that 
were dancing and marching, six having chosen the dancing 
posture. The children above the seventh grade indicated that 
the music suggested the color yellow, while the younger chil- 
dren named all colors except black. 

5. It is the belief of the writer that the present study based 
on elementary principles should be investigated on a larger 
seale to discover the extent to which auditory and visual fac- 
tors should be combined to produce a heightened and unified 
emotional effect. Movies, advertising, education and perhaps 
television could profit by further scientific studies on what the 
publie’s ear suggests to the eye. 

















A TEST FOR TRIDIMENSIONAL STRUCTURAL 
VISUALIZATION’ 


A NEW TEST FOR MECHANICAL INSIGHT DE- 
SIGNED PRIMARILY TO MEASURE ABILITY 
OR APTITUDE IN DRAFTING 


JOHN EDMUND CRAWFORD 
Pittsburgh, Pennsylvania 


en to deal accurately and rapidly with spatial 
relations is a prominent factor in such jobs as designer, 
draftsman, patternmaker, and engineer. The impor- 
tance of this capacity in the total ability profile of any job 
depends on the weight it has as a prerequisite to training in 
that field and on the use the particular job demands of it. 
Certainly the design draftsman has almost constant use of this 
particular aptitude in his job, and there is reason to believe 
that it is a very important prerequisite to his training for that 
job. Any test that can objectively detect or measure this 
ability becomes a useful instrument in counseling. 
Apparently, the new test described here measures the spatial- 
relations function inherent in the draftsman’s total ability 
pattern. The scheme of this test is a large nine-piece tridi- 
mensional puzzle so designed that the manipulative factor is 
small compared to the insight or structural visualization fac- 
tor. A good score on this test demands rapid accurate per- 
formance that involves simultaneous consideration of three 
dimensions. The baseboard has depressions and projections 
upon which the test pieces fit together in only one position. 
This design eliminates other chance placements that might 


1 Developed and standardized by John Edmund Crawford and Dorothea 
M. Crawford. Distributor, Psychological Corporation, 522 Fifth Avenue, 
New York, N. Y. 
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bring a finally correct but less measurable solution. The gen- 
eral population norms obtained with this test seem to indicate 
that it can differentiate the real draftsman from the average 
man in industry, placing the average designer at 92-percentile 
and the average detailer at 80-percentile. 

Other important factors enter into drafting, such as mathe- 
maties ability and drawing technique, which must also be as- 
certained in any guidance concerning drafting work. The 
presence of the single factor of spatial relations ability as in- 
dicated by this test, does not in itself predict drafting aptitude 
or measure drafting ability. Little weight should be given 
any score on this test in the absence of confirming evidence 
from other sources; but either a very high or a very low score 
is probably a valid basis upon which to seek such confirming 
data. 

To what extent this test may predict success in fields other 
than drafting will vary from job to job, depending on how 
much of the ability this test measures is involved, Like all 
form-boards, this test can furnish a standard situation in 
which to make observations of energy rate, method of attack 
and work, persistency, and others. 


PRELIMINARY MODEL TOO DIFFICULT 


The first form of the present test was more complex, but 
proved much too difficult a problem for the average person. 
It consumed an average time of nearly 14 minutes, thus intro- 
ducing an administrative problem where a large group must 
be tested. (Because of the form-board nature of the test, it 
ean be given to only one person at a time, out of view of all 
others to be tested.) This early model was revised through 
many changes into the present two forms of the new test. 


TWO EQUAL FORMS DEVELOPED 


Largely for the purpose of obtaining a reliability factor, two 
test forms were designed which seemed to be of equal power. 
Their patterns are similar, as shown in the accompanying illus- 
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trations. Each form consists of a 64-inch diameter circular 
dise to be built on a 10-inch square baseboard. But the nine 
blocks of Form 2 are the reciprocal of those in Form 1, and 
their relative position on the baseboard is out of phase with 
that of Form 1. Blocks from Form 1 will not fit on Form 2 
baseboard, except the isosceles triangles, which are identical. 
Only one test form would be necessary as a voeational test, 
when standardized. 





DETAILS OF TEST FORMS 


Both forms are constructed of 23-inch poplar. The {-inch 
deep lid of each box acts as a tray for the nine blocks when the 
test is arranged ready for use. A piece of white toweling 
tacked in the trays prevents the blocks from marring and 
makes a distinct background for them. This toweling can be 
removed for washing or renewal. The recesses in the }-inch 
thick baseboards are #-inch deep, and the little projections are 
3-inch high. There are no marks or cues other than these to 
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show where or how large the assembly is to be, on the base- 
boards. 

All blocks average j-inch high. They are smooth and 
glossy, showing no marks of any kind to indicate their places 
in the assemblies. They differ enough in size to avert at- 
tempts, with normal persons, to force them to fit incorrectly. 

All parts are enameled smoothly in a tan color. Any furni- 
ture wax cleans and polishes this surface nicely, and was used 
after every few hours of use of the forms in the standardizing 
procedures for this test. 


ADMINISTRATION OF THE TEST 


The standard routines for giving the two forms were similar 
except for the arrangement of the blocks in the trays. The 
illustration shows the setup for each form. 








The blocks for either form were arranged in the tray before 
the test was presented, This arranging usually was done be- 
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hind cover of the baseboard raised to a vertical position as a 
screen between the person’s eyes and the blocks in the tray. 

The illustration shows the operating position of the test 
across the table, with the baseboard directly in front of the 
person, the tray farthest from him toward the examiner. The 
directions were standardized as follows: 








As the person looks at the opened test before him, say ‘‘ Put 
all those pieces (pointing to pieces in tray) together on this 
board (pointing) as fast as you can, to form a large round 
(cireular) block that will be flat on top. Remember—the 
block must be round and flat on top, and all thoes holes must 
be filled... . Ready? ...Go!’’ Repeat these instructions 
completely if the person does not indicate that he understands 
the problem when he is asked if he is ready. Record exact 
time in seconds from the signal ‘‘go’’ to the instant the last 
piece is correctly in place. The assembly must he flat on top. 
Allow no other credit. Make no comment during the trial. 
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Close the box at the end of the trial and turn over toward 
examiner, to get ready for next administration. 


POPULATION SAMPLE 


Test Form 1 was given to 367 men who came to the local 
United States Employment Bureau to file applications for jobs 
or to secure insurance or compensation information. These 
men were in good work standing with their employers. About 
12 per cent were temporarily laid off at the time, through no 
fault of their own. Nearly 18 per cent were well employed, 
but came seeking job improvement of some kind. 

These 367 cases were drawn at random from the flow of men 
into the Bureau, over a period of three weeks, from 9:00 to 
5:00. All were white and English-speaking. The 367 cases 
were subdivided in close proportion to the loads shown by rec- 
ords kept by the Bureau on their unskilled, skilled, technical 
and clerical classifications of applicants. 

Every man taking the test was shown that his test perform- 
ance record could in no way affect his Bureau record but was 
for the sole purpose of standardizing the test. Record was 
kept only of the man’s job or trade and his raw score in sec- 
onds ; no names or employers were listed. Men who appeared 
very nervous on the test, or otherwise not quite up to par, were 
noted; these cases were thrown out of the general sample 
taken, reducing it finally to 346 cases to be statistically com- 
piled. 

THE SCALE 


These test performance data, when tabulated, confirmed the 
examiner’s feeling, when giving the test to these men, that 
its difficulty seemed disproportionately greater for the very 
slow ones than for the very fast ones, Inspection of the tabu- 
lated data suggested that the adjusted scale might be similar 
to the exponential function: y= (k)(¢c)*. In the attempt to 
set this general function to the actual data, it was assumed 
that +3 sigma would be about 25 seconds and -—3 sigma at 
1000 seconds, with x measured downward (in sigma units) 
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from +3 sigma toward —3 sigma. Thus the general func- 
tion for the curve was evaluated in these limiting terms: 
y = 25(1.85)*, so that 

+3 sigma=25 seconds 

0 sigma = 158 

-—3 sigma = 1000 
This adjusted scale was subdivided into 24 blocks, four equal 
divisions in each of the six sigma sections. 

Upon this adjusted seale the 346 cases previously described 
were then distributed, and gave a curve that was very little 
skewed but rather close to a normal curve. The distribution 
mean on this adjusted scale falls at 153.3. If the cases are 
distributed on an unadjusted scale (linear) the mean falls 
near 161, but the sigma deviations are not as true values as on 
the adjusted scale, especially below the mean toward — 3 sigma. 
However the difference between the distributions on these two 
seales is negligible above the mean, hence this mathematical 
technicality does not greatly affect the use of high test scores 
for vocational guidance, All tabulated data in this report are 
in terms of the adjusted scale described above. 


TABLE 1 


From the General Employable Male Industrial-Technical Population 
Sample. N=346. Adjusted Scale Values 











Raw score rn ae pa goa Percentile 
34 + 2.5 7.5 99 
46 + 2.0 7.0 97 
62 +1.5 6.5 93 
84 +1.0 6.0 84 
113 +0.5 5.5 69 
153 0.0 5.0 50 
208 -— 0.5 4.5 31 
380 -1.0 4.0 16 
RELIABILITY 


8.D. of Mean of distribution = 4.6 seconds 
P.E. of Mean of distribution = 3.1 seconds 
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TABLE 2 
P.E. of Sigma of the Distribution 





Raw score Standard score P.E. of sigma 





34 7.5 0.5 seconds 
46 7.0 0.7 
62 6.5 0.9 
84 6.0 1.2 
113 5.5 1.6 
153 5.0 2.3 
208 4.5 3.4 
380 4.0 5.8 





DRAFTSMEN TESTED 


From the engineering drafting room of a very important 
rolling mill manufacturer, 21 design draftsmen were given 
Form 1 test. From the general population, scores of the 8 
design draftsmen were combined with these 21 scores to get a 
vocational drafting norm, These 29 designers had a raw score 
range from 39 to 104 seconds, average at 65 seconds. This 
places them at standard score 6.4, percentile 92, in the male 
industrial-technical population. 

From the general population sample, 19 scores of detailers 
were found. Their average raw score was 94 seconds, stand- 
ard score 5.8, percentile 80. 

These 48 cases indicate that detail draftsmen and designers 
do better on this test than three-fourths of the general male 
industrial-technical population. 


RELATION OF RAW SCORE TO STANDARD SCORE 


Because the adjusted scale curve is exponential in nature, 
y = (25) (1.85)*, the relationship between a raw score P in 
seconds and its standard score SS is: 
(1) SS = 13.37 —3.83 log P 


13.37 —SS 
(2) P=10 383 


Formula (1) is the most used, but the values in Table 1 are as 
much as the average vocational counseling situation requires. 
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The percentile values in Table 1 were assigned from their re- 
spective standard scores, upon the assumption that the distri- 
bution of the 346 cases on the adjusted scale approximates 
the theoretical normal curve closely enough to permit such a 
translation. 


SOME CORRELATIONS 


The following correlations, in a measure, show some of the 
possibilities of the test and its uses: 

(1) Correlation of the scores on Test Form 1 with scores on 
Form 2 given three weeks later to the same 62 boys picked at 
random from high school shop courses: r= .89 + .02 

(2) Correlation of the scores on Form 1 of 48 boys chosen 
at random from high school trigonometry and solid geometry 
courses with their respective teacher-ranking as to their mathe- 
matics insight: r=.59 + .06 

(3) Correlation of the scores on Form 1 of 39 boys in high 
school drafting courses with their respective teacher-ranking 
as to degree of drafting insight (not drawing technique) : 
r=.91 + .02 (Their average score on Form 1 was 116, SS 5.46, 
percentile 68.) 

(4) Correlation of the same 39 boys’ Form 1 scores, as in 
(3), with their respective Otis I.Q.’s: r=.04 + .11 

(5) Correlation of the scores on Form 2 of 32 unemployed 
young men applying at an employment bureau, with their 
respective scores on the Minnesota Paper Formboard (Revised 
Form A-A) : r=.55 + .08 


PRACTICE EFFECT ON RAW SCORE 


Form 2 test was given to 28 boys in a high school shop group, 
each with six successive trials. Before each trial, the boy’s 
previous score was told him and he was urged to greater speed 
if possible. These data were averaged and follow the char- 
acteristic learning curve: 
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Average score—131 seconds 


73 
60 
41 
33 
32 


The successive trials tend to level off very rapidly. The score 
on trial 1 seems therefore to be the best index, not involving 
this learning effect. 

TABLE 3 


Tentative Vocational Norms 
The following vocational norms drawn from the general male sample 
(plus the 21 designers previously described) are significantly spaced 
along the scale. Scores in range and average columns are in seconds. 





Occupation Range Average 8.8. Percentile 





Design draftsmen 39-104 65 6.4 92 
Detail draftsmen 58-161 94 5.8 80 
Machinists 56-184 5.6 75 
Mechanics 69-246 5.4 65 
Laborers in shops 81-823 4.6 35 





CONCLUSIONS AND INFERENCES 


This new test appears to measure some aptitude or ability 
more possessed by design and detail draftsmen than by three- 
fourths of the adult male technical-industrial population. The 
nature of the test itself seems to infer that it is a measure of 
insight into spatial relations, here defined as tridimensional 
structural visualization. Test data indicate that this ability 
is probably present to a greater degree in the designer than in 
the detailer. 

The high scores of all the expert draftsmen tested somewhat 
reflect the validity and reliability of the test. The correlation 
factor of .89 + .02 of 62 boys’ scores on the two presumably 
equal power forms of the test also suggests reliability. 
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The test is relatively simple to administer and does not con- 
sume much time. It can be given to only one person at a 
time, out of view of all others to be tested, but can be given to 
about 12 persons in an hour, allowing 5 minutes per test. The 
test blocks apparently do not offer much of a manipulative 
problem ; they seem to be small enough for easy handling, yet 
distinct enough to provide adequate spatial stimulus. 

Considered with the necessary supporting data, this test 
offers another objective measure useful in many vocational 
guidance situations. While developed to measure one factor 
in aptitude for drafting, it may be used to check other data in 
guidance toward fields of work involving drafting ability, or 
ability to accurately visualize and manipulate spatial relations. 








THE STUDENT SKILLS INVENTORY: A STUDY 
HABITS TEST 


NORMAN M. LOCKE 
Hunter College of the City of New York 


INTRODUCTION 


N studying the component parts of scholastic ability several 
if variables have been employed which, although in some 
degree related, have not accounted in full for that ability. 
Intelligence tests, hours of study, and reading tests have been 
considered most frequently as measures of scholastic ability 
and used as variables. 

Although offhand one would expect a high degree of rela- 
tionship between intelligence and scholastic ability the corre- 
lations obtained are not much more than fair. The number 
of hours which the individual spends in studying correlates 
with school grades, but here too the coefficient is no more than 
fair. Although this correlation when combined with that of 
school grades and intelligence becomes appreciable it does not 
account for the entire variance of school grades. The same 
general statements can be made with respect to reading ability. 
By combining intelligence, hours of study, and reading ability, 
and correlating with school grades we increase the original 
coefficient but still we do not gain a correlation sufficiently 
high to enable us to state that these are the sole components 
of scholastic ability. 

It was with this problem in mind that the author constructed 
the present inventory, working on the assumption that the 
technique of study was a clue to one of the unidentified parts 
of scholastic ability. The present paper shows the degree to 
which such an assumption was justified. 
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The first form of the Student Skills Inventory was assembled 
by gathering statements which seemed significantly related to 
study skills. The sources of these statements, for the most 
part, were the books and pamphlets listed as references. These 
statements were grouped under several headings and presented 
as a whole. There were 116 items in all, which were given in 
the form of a five-point scale. In the Directions the reader 
was told to indicate the degree to which his habits corresponded 
to the statements by circling the number under the column 
which best expressed his own behavior. The columns were 
given the usual headings of never, seldom, sometimes, often, 
and always. The reader was also informed of the purpose of 
the Inventory and given assurance as to its use. 

The Inventory was then administered to 170 college women, 
freshmen and sophomores at Hunter College. The total scores 
were ranked and the highest 44 scores and the lowest 45 were 
chosen for further analysis. A distribution was made of the 
scatter of responses on each item for the highest group and 
again for the lowest group. For each item there were five 
values representing the combined response of the students in 
the upper group and the combined response of those in the 
lower group. Graphs of these items for each of the 116 items 
of the upper group were superimposed upon those of the lower 
group. Each graphed item was subjected to scrutiny to deter- 
mine whether it differentiated the upper group from the lower. 

That item which would differentiate the groups perfectly 
would be one to which all those in the upper group would 
circle the number for ‘‘always’’ and those in the lower group 
would circle the one for ‘‘never.’’ Since this perfection was 
never found, certain criteria had to be set up in order to deter- 
mine the value of the item in question. If the curves skewed 
in different directions, one toward the ‘‘always’’ and one 
toward the ‘‘never,’’ the item was retained. Conversely, if 
the curves were alike the item was discarded. Further criteria 
were the demand that if the modal score of one group were at 
‘*never,’’ ‘‘seldom,’’ or ‘‘sometimes’’ that of the other group 
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be at ‘‘often,’’ or ‘‘always,’’ and if the mode of one group 
were at ‘‘sometimes,’’ ‘‘often,’’ or ‘‘always’’ that of the other 
group be at ‘‘never,’’ or ‘‘seldom’’; when the numerical values 
of these modes were close, the mode of one had to be greater 
than the second mode of the other. 

As a result of this weeding-out the number of items was 
reduced to 56 and regrouped. The headings, given to aid the 
student in answering, were Notetaking, Study, Time budget, 
Reading, Examinations, Foreign language, and Vocabulary. 
The same form of presentation was kept: the student circled 
the number in the column of his choice. Of these 56 items, 44 
were either positively stated or favorable and 12 were either 
negatively stated or unfavorable. This, of course, served to 
break up any set for answering specific numbers rather than 
the descriptive classification. The numbers were arranged 
from 1 to 5 for the 44 items and from 5 to 1 for the 12 items. 
This arrangement resulted in high scores indicating satisfac- 
tory skills and low scores less satisfactory skills. The final 
score was simply the sum of the numbers circled. 

This revised form was then administered in October to an 


additional 135 students at Hunter College and exactly four 
weeks later was given again. The resulting data were in the 
form of a total score and seven sections. 


RESULTS AND DISCUSSION 


Reliability. The retest reliability coefficient of the Inven- 
tory was .80. To reveal the degree of internal consistency the 
split-half technique was employed and the results stepped up 
with the Spearman-Brown formula. The resulting coefficient 
was .81. The reliability of the Inventory was considered 
satisfactory and the self-consistency equally so. 

Validity. Through the kindness of Miss Dorothy B. Ball, 
Assistant Registrar of Hunter College, the records of about 
3000 students were made available. These were examined in 
order to obtain a group of subjects with high and low grade 
averages. Cumulative grades are recorded as the ‘‘index,’’ a 
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weighted average of combined grades with A at 4 and F at 0. 
Students at the upper and lower 2 per cent of the distribution 
were asked to volunteer as subjects. Fewer of the students 
at the low end of the curve appeared, necessitating a second 
appeal to others. When taking the Inventory these students 
were instructed not to place their names or any identifying 
marks on their papers. 

The final number of cases was 104. There were 57 in the 
high index group with classes ranging from Upper Freshman 
to Upper Junior. The highest index in this group was 4.0, 
the lowest 3.0, and the mean was 3.3. There were 47 students 
in the low index group with a range of class from Upper 
Freshman to Lower Junior. The highest index here was 1.6, 
the lowest 0.4, and the mean was 1.2. There was no attempt 
made to match these students in any variable other than college 
class. The only selection on the part of the experimenter was 
to choose a group of B+ and D — students. 

One year later the Inventory was administered to an addi- 
tional 200 students and after ranking their indices the highest 
and lowest 26 students were chosen as forming the high and 
low scholarship groups. In each group the classes ranged 
from the Upper Freshman to Lower Junior. The range of 
index of the higher scholastic group was 3.8 to 3.0 with the 
mean at 3.3, and that of the lower group was 2.0 to 1.6 with 
the mean at 1.8. 

In both instances the high index group scored higher on the 
Inventory than did the low index group. In the earlier test- 
ing the mean for the high index group was 195 with the stand- 
ard deviation 26.3 and that for the low index group was 184, 
standard deviation 24.1. When tested for reliability of the 
mean difference it was found that the chances were 99 in 100 
that the obtained difference was a true difference. These 
results are shown in Table 1. Since the number of cases in 
each group was relatively small in the later testing, standard 
deviations were computed on the basis of N-1. The mean of 
the high index group in the later testing was 198.3 with a 
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TABLE 1 


Mean, o, and Significance of Mean Difference between High and Low 
Scholarship Groups. Subscript 1 Indicates the Earlier Testing 
and 2 the Later one 











High, Low; High, Lows, 
M » 195 184 198.3 179.6 
i eee See a 26.3 24.1 25.7 22.5 
PE A pchosccishescacsdaliicbanias nm 11.0 18.7 
GG RG — womrrtcctnv cco cce ccc ensstinstatasessotee 4.9 6.7 
2 : eR 1 2.2 2.8 
Op 
BO aceite iatenenocnnticiean 99/100 99/100 





standard deviation of 25.7 and that of the low index group 
was 179.6 with a standard deviation of 22.5. When testing 
the reliability of the mean difference the critical ratio was 
interpreted by Fisher’s ¢ distribution (4). Again, the chances 
were 99 in 100 that this difference was a true difference. 

In the first testing the range of the high group was 133 to 250 
and that of the low group was 146 to 246. In the second test- 
ing the ranges were 146 to 241 for the high group and 143 to 
231 for the low group. In both instances the distribution of 
the high group piled up toward the upper end and that of the 
low group toward the lower end. These distributions were 
therefore tested for skewness and significance of skewness. 
The formulae employed were Sk = P. 50-4 (P. 90+ P. 10) and 


P.90-—P.10 
Osx = .51850 ———_=—-. 


VN 
As can be seen in Table 2 both distributions of the high 
group are significantly skewed upward with the chances of 
such significance being 99.9 in 100. The distributions of the 
low groups skew downward, significantly so in the later testing 
and with 93 chances in 100 of significance in the earlier testing. 
Such differences in results as are found between the earlier 
and later testing can be explained by the method of obtaining 
subjects. Since the subjects in the earlier testing came at will 
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TABLE 2 
Skewness, o, and Significance of Skewness Difference of High and Low 
Scholarship Groups. Subscript 1 Indicates the Earlier Testing 
and 2 the Later Testing 








High, Low, High, Low, 
ker 5. a 6 -1 10 a4 
GS ctcoeecredidiiens .63 .67 1.48 1.12 
Sk 
name a cesisstsiisincaalin 9.5 1.5 6.8 3.6 
Osx 
REE SD 99.9/100 93/100 99.9/100 99.9/100 





there was a tendency to have the lower scholarship group 
somewhat curtailed. In the later group the students were 
tested first and chosen after the Inventory had been com- 
pleted. 

Although it was rather difficult to obtain two groups which 
were clearly different in scholastic achievement and in spite 
of the reluctance of some individuals to expose themselves to 
such an inventory as the present one, the data were such that 
one is justified in stating that the Inventory differentiates high 
index students from low index students. The difference in 
means was found to be a real difference and the shapes of the 
curves of the two distributions tended to be distinct. 

Some intercorrelations. Correlations were run between the 
total score on the Inventory, the index of the students, intelli- 
gence, and hours of study. Through the kindness and co- 
operation of Dr. M. K. Gallagher of Hunter College the author 
was supplied with the intelligence scores. These were Ameri- 
can Council in Education raw scores. The average score of 
133 cases was 220.4, standard deviation 46.1. Hours of study 
was obtained some time after the students had taken the first 
test and was concealed as a unit in a study which purported 
to be concerned with the general distribution of the student’s 
time. In Table 3 we have the several correlation coefficients. 
The most interesting correlation for our immediate purpose is 
that between the Inventory and index, one of .35.'_ The sig- 


1A coefficient of .39 was obtained when the scores of the second test 
for reliability were correlated with the index. See Table 6. 
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TABLE 3 


Correlations between Inventory (1), Index (2), Intelligence (3), and 
Hours of Study (4) 








r N 
12 35 135 
13 18 133 
14 28 131 
23 .38 133 
24 26 131 
34 — .03 129 





nificance of these correlations is revealed by a comparison of 
that between intelligence and index, namely .38. In other 
words, the Student Skills Inventory was as closely related to 
the grades that students earned as was their intelligence as 
measured by the A. C. E. Various other coefficients can be 
seen in the table. 

Correlations between Inventory and index. Through the 
kindness of several individuals, the author is able to present 
data gathered at four colleges in addition to Hunter College. 
Dr. M. R. Schneck of the University of Arizona, Dr. C. I. 
Mosier and Dr. A. C. Van Dusen of the University of Florida, 
Dr. W. J. E. Crissy and Dr. J. F. Dashiell of the University of 
North Carolina, and Dr. G. Dudycha of Ripon College admin- 
istered the Inventory and collected information on student 
indices. The author is glad to express here his appreciation 
of their interest and cooperation. 

In all, there were 237 students who completed the Inventory 
in addition to the 135 at Hunter College. The official class 
of these students is shown in Table 4. Grade indices were 
computed for the semester immediately preceding the one 
during which the Inventory was administered. These are pre- 
sented in Table 5. Four of the colleges have an average grade 
of C +, with Ripon averaging B. The standard deviations are 
about the same. The high score for all reach the maximum or 
come closely to it, and there is a tendency for three schools 
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TABLE 4 


Distribution of the Official Classes of Students Used in the Correlation 
between Inventory and Index 








N UF LS US LJ UJ LSrUSr SJ ? 
pS” eS se. .1 3 36 1 6 0 mae... 9 
POI, sictcsccneencde 63 3 1 0 20 6 4 <= 1 
mS” ee ee 8 4 9 6 - — 2 
North Carolina ..... 50) «6(—0 - 415 - 380 ~ 6 - - = 
BOE  kecttaniegnntnne 74 2 - 60 - ill - 10 -- 1 





to stay above the minimum score, with Florida and North 
Carolina dropping below the others. 

Since the students at Hunter College had taken the Inven- 
tory twice for purposes of reliability, both scores were corre- 


TABLE 5 
Mean, Standard Deviation, and Range of Index for All Groups 




















M o High Low 
Arizona 2.8* 8 5.0 1.0 
Florida 2.5t 7 4.0 0.0 
Hunter 2.6t 5 3.9 1.2 
North Carolina. ..................... 2.6t 8 4.0 0.8 
Ripon 83.2 6.2 96.6 70.3 





*Az=1,F=5. tAz=4,F=0. { Per cent scores. 


lated with index. The correlation coefficients for all colleges, 
given in Table 6, range from — .07 to .37. The median correla- 
tion, if such can be used, is .28. There is a strong suspicion 
that these coefficients would be much higher if an index con- 
structed on the basis of the entire college career were used. 
This is the case with the Hunter correlations, the full index 
correlating .35 as opposed to .29 for the first testing, and .39 
as opposed to .32 for the second testing. These correlations 
were computed from the scores of the same group of students. 

Tentative norms. The Inventory should be considered as a 
whole although it is made up of various sections. It is admit- 
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TABLE 6 


Correlations between One-Semester Index and Inventory, and Total 
College Index and Inventory for Hunter. The Italicized 
Coefficient Was Used in Table 3 














One semester Total 
N r N r 
NF a ed 50 .37 
cei 63 — .07 
Hunter Ist testing .................. 115 29 135 35 
Hunter 2nd testing ..................... 113 32 134 39 
North Carolina occcccccccccsccscsccccoen 50 .09 
Ripon i. .28 








tedly of value, however, to examine the sections themselves, 
bearing in mind the small number of items in some of them. 
That there is justification in so doing can be seen by comparing 
the obtained scores and scores which might have been expected. 
If the number of items in each section is multiplied by 1, 2, 
3, 4, and 5, the steps of the scale, a comparison can be made 
between the obtained scores and an expected score. The Note- 
taking, Study, and Time budget sections consisted of 7 items 


TABLE 7 


Obtained, Combined, and Expected Means for the Sub-Tests, and Standard 
Deviations of the Hunter Group 
N =372 





Note Stud Time Read Exam ForL Voce 





FS gp EL a oe 22.1 22.7 23.6 47.2 245 13.5 19.6 
BR  dertctinsareieaatinintrenh 21.5 208 204 49.2 241 128 20.1 
BIA sasorssianeisiniisteteanteinmitst 246 248 23.8 525 258 169 21.1 
North Carolina. ................... 22.3 20.1 209 493 23.1 141 #& 20.2 
BI peciiniegiestrttiipienensintinicors 216 218 233 49.9 25.3 13.7 19.5 
eer 22.8 22.6 22.7 503 249 14.7 20.3 
hig a 21.0 210 210 48.0 240 15.0 18.0 
gO EOL Re 3.5 4.4 4.6 8.1 5.0 4.0 4.3 
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each, the Reading section was made up of 16 items, and the 
Examinations, Foreign Language, and Vocabulary sections 
consisted of 8, 5, and 6 respectively. The expected average 
would be the number of items in the section multiplied by 3, 
the weight of the middle step. Table 7 reveals that the com- 
bined obtained averages of the five colleges differ but slightly 
from the expected averages. The standard deviations of the 
Hunter College group have been included for a rough rule-of- 
thumb comparison. 

In Table 8 we have the average total scores for all groups 


TABLE 8 


Obtained and Combined Means, Standard Deviations, and Range of Total 
Scores for All Groups 











N = 372 

Mean o High Low 
I « rssiineticimecicsicticnen 173.2 20.3 213 123 
SUI iiniichitinnsingicischaniinibi 169.1 22.0 227 121 
EEE SET 187.6 23.8 245 127 
North Carolina. ............... 169.9 21.3 217 124 
Ripon 176.1 21.8 218 129 
NINE; wcisintescsemigmenicts 177.9 22.3 245 121 





as well as the standard deviations and range. As above, the 
combined average of 177.9 can be compared with the expected 
average of 168. The combined standard deviation is 22.3, and 
the range is 121 to 245. Dividing 6 standard deviations by 5, 
we arrive at the following levels based upon the present data: 
Excellent—218 to 244, Good—191 to 217, Average—164 to 190, 
Fair—127 to 163, Poor—100 to 126. 


SUMMARY AND CONCLUSIONS 


Seeking to devise an inventory which would reveal study 
habits, a 56-item scale was constructed. The retest reliability 
of this inventory proved to be .80 with a coefficient of internal 
consistency of .81. Significant differences were obtained in the 
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mean scores of students with high grades and students with 
low grades. Comparisons between correlations of Inventory 
score and grades, and intelligence and grades showed that the 
coefficients are about the same. The median correlation be- 
tween the Inventory and grades for five colleges was .28. Ten- 
tative norms were constructed. 

The Student Skills Inventory has proven to be a reliable in- 
strument, one which differentiated students with high grades 
from students with low grades, and one which correlated to 
some degree with the college grades of students. Although, 
to date, no work has been done with secondary schools the 
Inventory is believed to be applicable to that level. 

To what use can the Inventory be put? For one, it can be 
used for diagnostic purposes. If students with low college 
grades have low total scores on the Inventory some degree of 
causality can be inferred. It may locate a specific cause of 
scholastic difficulty by showing one or more low sub-test scores. 
By the same token, an individual who desires to improve in 
study skills, regardless of score, can discover wherein the 
avenue to such improvement lies. 

At the moment, a statement as to the prognostic ability of 
the Inventory must be held in abeyance until further correla- 
tions in several other colleges clear up the present ambiguity 
of correlation. However, if the median correlation be accepted 
as some indication of relationship, then success or failure in 
college can be predicted when the Inventory is used in a bat- 
tery with other tests.” 

Students can be classified as to study habits on the basis 


2 It would be ideal to be able to present a regression equation in which 
several factors combine to produce the entire structure of school grades. 
Such an equation should have a minimum intercorrelation of factors. One 
of the many offshoots of the present problem was a correlation between 
the score on the reading habits section of the Inventory and the compre- 
hension score on the Iowa Silent Reading Test. The mean score of 38 
Hunter College students on the Iowa was 178.2, o 16.5, and on the reading 
section of the Inventory was 52.0, ¢6.9. The correlation coefficients 
between these two variables was .04. 
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either of total score or sub-test score. These classifications can 
be used for guidance programs, either group or individual, or 
orientation courses. 

Using the first form of the Inventory the author has studied 
the effect of a general psychology course upon study habits 
and the effect of an orientation course upon study habits. The 
results of these studies will appear soon. Thus, the Inventory 
can also be used to evaluate guidance work. By administering 
it at the beginning and at the end of a guidance program the 
amount of improvement and the area of improvement can be 
seen. 

The author hopes to be able further to test the use of the 
Inventory with respect to diagnosis, prognosis, classification, 
and evaluation and to present results at an early date. 
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THE RELATIVE IMPORTANCE OF CONTAIN- 
ERS AND LABELS IN DETERMINING CON- 
SUMER’S PREFERENCES FOR SEVERAL 
BRANDS OF TOMATO CATSUP 


JOHN G. WATKINS 


Teachers College, Columbia University 


INTRODUCTION 


ROBLEMS involved in the attractive packaging of vari- 
Pp ous commodities seem to have received little attention 

compared to other phases of advertising and salesman- 
ship. Viteles (1) in his article ‘‘ Psychology in Industry’’ re- 
views various research fields in industry and points out the 
need for work along this line. Franken and Larabee (2) have 
reported studies in this field and have suggested research 
methods. Three other studies might be mentioned which have 
attacked various angles of the problem, those by Feller (3), 
Warner (4) and Balchin (5). In spite of these excellent 
studies the impression received in perusing the literature avail- 
able is that an important field of advertising research has been 
greatly neglected. 


THE PROBLEM 


A study reported by Hovde (6) on consumer preferences 
for small glass containers noted significant differences of choice 
for variously shaped bottles and jars. In a Philadelphia re- 
tail store customers were asked to rank six differently shaped 
jars containing caviar and five differently shaped jars of bis- 
mark herring. The bottles were devoid of labels. 

Orders of preference for the six caviar jars and the five 
herring bottles were computed by averaging the rankings of all 
judges. Relative per cents of importance were then calcuiated 
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by arbitrarily assigning 100 to the brand standing first and 
dividing its average ranking by the average ranking of each of 
the other bottles in turn. The assumptions justifying this 
procedure were not stated. Reliability was secured by figur- 
ing cumulative averages and gathering more data until the 
relative position of each bottle became stable. 

Two points in this procedure detract from the value of the 
study : 

1. Bottle shape is judged as if it were a factor completely 
independent of the labels usually on the bottle. 

2. The method of averaging takes no account of the probable 
normality of judgment distributions. 

Under point 1 an artificiality is introduced, as customers are 
asked to indicate preferences when a factor of importance has 
been removed. This factor of label attractiveness always ap- 
pears under normal merchandising conditions. In point 2 
we find a lack of sound statistical treatment. Since there is 
no assumption of normality and a corresponding treatment of 
the data, the averages and their relative values are left on an 
unsound basis. Although normality might not hoid for these 
particular distributions, the weight of past evidence indicates 
that in the absence of evidence to the contrary, the assumption 
of normality is the safest procedure. No measures of disper- 
sion are calculated, and hence the relative reliability of the 
averages is problematical. The method of assuring reliability 
by computing cumulative averages is somewhat clumsy, and 
better methods of handling this problem are available. The 
study, in other respects, is an admirable piece of work on a 
much neglected problem. 

In the following study the attempt was made to remedy the 
two previous criticisms of method, but the real purpose goes 
much further. Hovde finishes his report with the following 
statement: ‘‘The seemingly lengthy procedure of testing is 
fully justified when one considers and compares the increased 
possibilities in sales and display value gained by the reinforce- 
ment of the commodity with the proper glass container.’’ 
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Immediately the question arises, is the shape of the glass con- 
tainer the most important factor in determining consumer’s 
choices of attractiveness? In the present study an effort is 
made to measure the relative importance of bottle and label in 
the consumer preferences for five well-known brands of tomato 
catsup. This product is, of course, more widely used than 
caviar or bismark herring. 


METHOD OF GATHERING DATA 

The customers of a grocery store in a small western city 
served as judges. Each subject was handled as follows: As a 
customer, while shopping in the store, approached the display, 
he or she was accosted and asked, ‘‘I beg your pardon, but 
could I have your opinion on my display here? I am doing 
independent research and would very much appreciate it if 
you have a minute to spare.’’ If the customer indicated that 
she (of the 100 judges 70 were women and 30 men) was in a 
hurry, no further effort was made to secure her cooperation. 
Most people out of natural curiosity were willing to act as 
judges. 

The subject’s attention was first directed to the display (see 
accompanying photo), and the question was asked, ‘‘ would 
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you please rank the five brands of tomato catsup in order of 
attractiveness of bottle shape. Which of the brands do you 
think has the most attractively shaped bottle? Do not let the 
label or anything other than the bottle influence your deci- 
sion.’’ The choice was noted on the data sheet, and then she 
was asked, ‘‘and your second choice?, —third?, and which of 
these two do you prefer?’’ indicating the two remaining 
brands. The one unselected was then placed last on the data 
sheet. The five brands, C.H.B. (formerly called California 
Home Brand), Del Monte, Heinz, Sniders and Van Camps 
were arranged in alphabetical order on the display shelf. 
Under them were placed the letters A, B, C, D, and E in the 
same order (see photo). The subjects were urged to give 
their choices by letter. 

The preferences on bottle shape having been secured, the 
judge was next asked, ‘‘would you now please rank these 
brands in order of attractiveness of labeling?’’ A similar 
warning as to specificity of attention was made as in the first 
judgment. By the time these were noted down, the customer 
usually indicated fatigue and an unwillingness to continue 
attention on the problem. The greatest amount of tact and 
pleasantness at this point was necessary to enlist her continued 
attention until all data had been secured. 

She was next asked, ‘‘ Now, judging from the standpoint of 
general attractiveness, taking all factors into account, bottle, 
label, cap, ete., in what order would you place these brands ?’’ 
After these last rankings were secured, if the customer did not 
indicate too much fatigue, she was asked to select the one cap 
' which she preferred. (See Table 1.) All but one subject 
who started completed the first three sets of rankings, and 87 
indicated their preference of caps. The one incomplete data 
sheet was discarded. 


STATISTICAL TREATMENT OF DATA 


For purposes of averaging the data, attractiveness in each 
feature was assumed to be normally distributed. By using 
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TABLE 1 


Cap Preferences 











Brand Men Women Total 
A 16 31 47 
B 1 2 3 
Cc 10 24 34 
D 0 0 0 
E 0 3 3 
Total 27 60 87 
a 100(R—-.5 , 
the formula (%) Position equals TOR) and converting 


the per cents so obtained into standard scores upon a scale of 
100, the weights of 75 for first choice, 60 for second, 50 for 
third, 40 for fourth and 25 for last place were assigned. Maul- 
tiplying the number of times a brand was placed in each cate- 
gory by the respective weight for that category, summing these 
moments and dividing by N gives us the arithmetic mean for 
that brand in that feature. The mean preference value for 
each brand in each of the features of bottle, label and general 
attractiveness was so computed. 

Next, standard deviations for each brand in regard to each 


. FD? 
feature were figured using the formula S.D. equals Nil 


F being the frequency with which the brand was ranked in a 
eategory, D the deviation of the weight of that category (75, 
60, 50, 40 or 25) from the mean previously calculated, and & 
the sum of all such moments of all five categories for the brand 
in regard to the feature under question. Standard errors of 


all averages were also computed by the formula Cay. = 


JN 


and standard errors of difference between each two consecutive 
averages using first the short formula: opirr, = Vor ay., + o*av., 
This presupposes no correlation between the two brands, an 
assumption which may or may not be true. If any relation 
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TABLE 2 


Average Ranking of Each Brand in Each Feature in Terms of Standard 
Score, Dispersions of Distributions, Reliability of Averages, and Signifi- 
cances of Differences between Averages 

Brand A—C.H.B. or California Home Brand 
Brand B—Del Monte 

Brand C—Heinz 

Brand D—Sniders 

Brand E—Van Camps 








Average of rankings D 
Brands 2 8.D. B.Disv Seidinas, 8.D.piee. 
Men Women Combined (r-1.00) 





Feature 1. General Attractiveness 
Cc 61.15 59.80 60.20 14.7 1.47 


2.82 2.31 

B 52.51 55.17 54.30 14.8 1.48 
2.07 1.46 

A 49.34 49.57 49.50 17.9 1.79 
1.59 1.13 

E 43.85 46.56 45.75 15.3 1.53 
2.60 1.84 

D 40.35 39.50 40.25 14.6 1.46 

Feature 2. Attractiveness of Label 

Cc 61.00 57.61 58.60 16.0 1.60 
2.19 1.55 

B 53.32 52.86 53.70 15.7 1.57 
1.76 1.24 

E 44.15 52.10 49.70 16.6 1.66 
72 52 

A 50.33 47.20 48.15 13.8 1.38 
3.80 2.70 

D 40.35 39.63 39.85 16.9 1.69 

Feature 3. Attractiveness of Bottle 

Cc 62.25 58.25 59.45 14.7 1.47 
3.11 2.21 

B 53.83 53.30 53.50 12.2 1.22 
59 43 

A 51.33 52.43 52.10 20.5 2.05 
3.73 2.66 

D 40.15 43.49 42.50 15.6 1.56 
02 02 


E 42.52 42.43 42.45 13.0 1.30 
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existed the longer formula: opite. = \/o"av., + O?av., — 2FCav.,CAv., 
should be used. In order to avoid the tedious calculation of 
all of the coefficients that would then be necessary, new S.D. 
differences were computed by the long formula substituting 
—1.00 for r. The true §8.D. differences will then be less than 
those secured by the use of the second formula in this way. 
We are, however, safe in assuming that no correlations as high 
as — 1.00 actually obtained. (For averages and their signifi- 
cances see Table 2.) coandS§.D. are used here interchangeably. 

Pearson correlation coefficients for each brand were then 
computed between the features of general attractiveness, 
attractiveness of label and attractiveness of bottle shape called 
factors 1, 2, and 3, respectively. From these, first order par- 
tials were figured by methods given in Garrett’s ‘‘Statistics in 
Psychology and Education.’’ Beta coefficients were then cal- 
culated for each brand giving the relative weight with which 
the bottle and the label contribute to the total general attrac- 
tiveness, and multiple R, indicating the part of this last factor 
accounted for by the combined action of bottle and label. 
Results were checked by solving again using the Doolittle 
method. (For correlation data see Table 3.) The Beta coef- 
ficients are, of course, computed as ordinary regression weights 
except that differences in variances are ruled out, the standard 
deviations being held constant. 


DISCUSSION OF RESULTS 
Brand A. C.H.B. or California Home 


This brand is placed at an average general attractiveness. 
It is selected as third of the five in this respect with a score of 
49.5. There seems to be no sex differences of opinion. This 
general attractiveness is made up of an almost average label 
(fourth place, score 48.15) and a slightly better than average 
bottle (third place, score 52.10). There is a slight indication 
that men may like the label better than women, but the differ- 
ence is not a reliable one. The interesting fact about this 
brand is that because of its outstanding design, especially in 
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Brand A—C.H.B. 
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Brand B—Del Monte 


Brand C—Heinz 
Brand D—Sniders 


Brand E—Van Camps 


TABLE 3 
Correlations between Features for Each Brand, Regression Weights 
(Betas), and Multiple R 
Factor 1—General Attractiveness 
Factor 2—Attractiveness of Label 
Factor 3—Attractiveness of Bottle 








Brand Correlation 

A r12 50 Sigma 1.23 13.0 
r13 57 R1(23) .69 
r23 23 X1 equals .39X2 plus .48X3 
r12.3 46 about 4 to 5 
713.2 54 

B r12 69 Sigma 1.23 10.1 
r13 24 R1(23) -70 
r23 18 X1 equals .67X2 plus .12X3 
r12.3 68 about 11 to 2 
r13.2 54 

Cc r12 64 Sigma 1.23 10.9 
r13 46 R1(23) 68 
r23 Al X1 equals .54X2 plus .24X%3 
112.3 56 about 9 to 4 
113.2 28 

D r12 63 Sigma 1.23 10.7 
r13 37 R1(23) 69 
123 18 X1 equals .59X2 plus .27X3 
r12.3 56 about 9 to 4 
113.2 .28 

E r12 67 Sigma 1.23 10.9 
r13 42 R1(23) -70 
r23 33 X1 equals .60X2 plus .22X3 
112.3 56 about 8 to 3 
113.2 .29 





Average R equals .691 (averaged by Fisher’s z conversion). 


the bottle, it arouses strong likes and dislikes. 
this respect and the bimodality of its distribution (centers on 
first and fifth place) indicate that people are either strongly 


A high §.D. in 
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attracted to or they dislike the bottle; they do not tend to be 
indifferent. The cap is highly preferred and was given first 
choice by 47 out of 87 who rated caps. This is the one brand 
where the bottle had more weight in determining the general 
attractiveness than the label, the relative weights being bottle 
6 to label 5. Its position of third place in general attractive- 
ness is fairly certain, there being only 1 chance out of 50 that 
more judgments would bring it up to second place and 1 out 
of 25 that it would drop to fourth standing. 


Brand B. Del Monte 


This brand is distinctly more attractive than average with 
a score of 54.3, being second only to Heinz. There is about 1 
chance out of 400 of it reaching first place and 1 out of 50 that 
it could drop to third position if more ratings were secured. 
Women may possibly place the brand higher than men, but the 
difference here is not reliable. This attractiveness is deter- 
mined by both a good label aad a good bottle (second place). 
The low S.D. in the bottle ratings indicate that the judges were 
in rather close agreement about its true position. However, 
in the bottle feature it runs very close to the C.H.B. brand, 
and there are at least 2 chances out of 7 that its bottle is 
actually less preferred than the C.H.B. bottle. The attractive- 
ness position of this brand is determined much more by its 
label than by its bottle (53 to 1), which indicates that although 
the bottle is attractive, people tend to disregard it in judging 
the general attractiveness. 


Brand C. Heinz 


There seems to be very little doubt that this brand stands 
first in all three features. It appears to be placed higher by 
the men than by the women, but again the difference is not a 
reliable one. Its cap receives second choice, 34 out of 87 pre- 
ferring it. Its general attractiveness score of 60 places it 
definitely the highest of all the brands. The label seems to 
have about two and a third times the importance of the bottle 
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in determining this. The bottle, although attractive, does not 
get the attention that the very excellent green and white label 
does. Its high score in all features may partly account for its 
position in the United States as the leading seller in the field. 


Brand D. Sniders 


This brand is decidedly the lowest in attractiveness with 
only one chance out of two hundred of even rising to fourth 
place. It combines by far the poorest label with almost the 
poorest bottle to achieve this result. Label contributed over 
twice as much as bottle to the total effect. Although it stands 
above Van Camps in its bottle, it is so slightly above that the 
chances are only a little better than even it could retain its lead 
if more judgments were secured. Its label is quite reliably 
established as the poorest. It was the only brand using orange 
and black as the colors for its label. No significant sex differ- 
ences of judgment appear. 


Brand E. Van Camps 


The general attractiveness of 43.85 or fourth place seems to 
be fairly reliable as this brand has only 1 chance out of 17 of 
reaching third place and 1 out of 200 of dropping to last 
place. This is the product of an average label (49.7) and a 
very poor bottle (42.25) or last place. One of the most signifi- 
cant peculiarities noticed is the great difference between the 
men’s ranking and the women’s. Seemingly the women are 
attracted much more to its simple design than the men. The 
difference here is a statistically significant one. 


GENERAL DISCUSSION 


The average multiple R of .69 would probably be higher if 
the judgments were more reliable. It is evident that the bottle 
and the label should together contribute more to the total 
attractiveness than these coefficients show. 

In an earlier study (7), unpublished, the writer investigated 
the three factors used here and also catsup color. These were 
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judged under laboratory conditions with each factor presented 
separately, hence bottles devoid of labels, labels on identical 
botles, ete. The subjects were college students. Only Brands 
A, C, D, and E were used at that time. The factors of recall 
of brand name and taste preferences were included in this 
earlier study. The two studies showed the following over- 
lapping of results: 

1. In general attractiveness the brands A, C, D, and E were 
placed in the same relative order. 

2. In attractiveness of label the brands were arranged in the 
same order by the combined judgments. 

3. In attractiveness of bottle brands A, C, and D were 
placed in the same order. Brand E at the time of the first 
study had a bottle shaped like Brand C and was ranked first 
instead of last. Between the two studies it changed from the 
most attractive to the least attractive shape. 

4. Incomplete correlation results in the first study showed 
the greater weight of label over bottle with the exception of 
Brand A, as in this study. 

The results of these studies would be applicable to other 
brands of tomato catsup only in so far as it might be assumed 
that the brands studied were typical of the whole field. 


SUMMARY OF FINDINGS 


1. In four brands of tomato catsup the label contributes 
from 2 to 5 times as much as the bottle in the determination of 
general attractiveness. 

2. In one brand an unusually styled bottle showed a slightly 
greater contribution than the label to the general attractive- 
ness. 

3. Certain methods for attacking problems of this type are 
suggested by the study. 
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NEWS AND NOTES 


William Frederick Book, former head of the department of psychology 
and philosophy, Indiana University, died at his home, Long Beach, Cali- 
fornia, May 22, at the age of sixty-eight years. From 1906 to 1912, Dr. 
Book was professor of psychology at Montana State University, Missoula. 
In the latter year he returned to his alma mater, Indiana University, as 
professor of psychology and in 1917 became head of the department. He 
retired in 1934.—School and Society. 

Dr. Book was for some years editor of this Journal. 


The Psychological Corporation announces a luncheon for its Directors 
and Research Associates to be held on Wednesday, September 4, during 
the meetings of the American Psychological Association at State College, 
Pennsylvania. 


Dr. Karl M. Dallenbach, professor of psychology, Cornell University, 
offered courses in elementary psychology and conducted a seminar on 
Attention at the summer session of the University of California.—School 
and Society. 


The University of Pennsylvania is completing arrangements for a Bi- 
centennial Conference at which the list of speakers will include more than 
200 American and European scholars and leaders in various fields of 
science and thought. 

The conference, to be held on the University’s campus in Philadelphia 
from September 16 to 20, this year, will form part of a Bicentennial Week 
program commemorating the 200th anniversary of the origin of the 
University of Pennsylvania. 

Six general fields—the Fine Arts, Humanities, Medical Sciences, 
Natural Sciences, Social Sciences, and Religion—will be covered by the 
conference, during the course of which there will be eighteen general 
sessions and fifty-nine symposia. 

In the field of the Humanities the program is designed to bring out the 
continuity of culture, while in other fields the objective is to reveal the 
trends of modern thought and the advances of science. 

Membership in the five-day conference will carry with it the privilege of 
attending the general sessions and symposia and will be open without 
charge, upon application and within the limit of accommodations, to those 
interested in the program. 


517 














518 NEWS AND NOTES 


Applications for membership, of which more than 3,000 already have 
been received in response to invitations issued by the Bicentennial Com- 
mittee, may be addressed to the Registrar of the Bicentennial Conference, 
University of Pennsylvania, Philadelphia. 


A new department of psychology has been established in the College 
of Arts and Sciences, the University of Nebraska, with Arthur F. Jenness, 
associate professor of psychology, as chairman. The psychological labo- 
ratory at the university has existed since 1889, but until now it has been 
a part of the department of philosophy and psychology. The staff of the 
new department, in addition to Dr. Jenness, consists of Donald W. 
Dysiner and William E. Walton, assistant professors, and W. 8. Gregory 
and Roger W. Russell, instructors. No change has been made in the 
department of educational psychology and measurements in the Teachers 
College. 

J. P. Guilford, first director of the bureau of instructional research, 
University of Nebraska, has accepted a professorship in psychology at the 
University of Southern California. H. M. Cox has been named Dr. 
Guilford ’s suecessor.—School and Society. 


The annual meeting of the American College Personnel Association will 
be held in Atlantic City, New Jersey, from February 18 to 21, with head- 
quarters at the Chalfonte-Haddon Hall Hotels. Advance indications from 
the Program Committee promise a program of unusual interest. The 
Annual Report of the St. Louis meetings will soon be off the press and can 
be secured with other publications by sending a check for four dollars for 
membership dues to Dr. James A. McClintock, Drew University, Madison, 
New Jersey. 


The American Council on Education has recently published a study 
prepared for the American Youth Commission by Allison Davis and 
John Dollard entitled Children of Bondage. This is the first volume of a 
series of four dealing with the personality development of Negro youth. 
Investigators were sent north, south and west to mingle with Negro youth 
and to achieve an intimate knowledge of their lives. The field studies 
for this volume were made over a period of thirteen months by five inter- 
viewers, four of whom were Negroes. Thirty adolescents were interviewed 
several times weekly over a period of from four to seven months. The 
technique employed by the authors represents an integration of psycho- 
analytic and sociological principles. The entire work supports the 
authors’ thesis that ‘‘social class governs a much wider area of the child’s 
training than do the Negro-white controls.’’ The eight children whose 
personalities are examined in greatest detail represent all of the class 
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positions in Negro society, and their experiences illustrate the fundamental 
controls which each class exercises over the socialization of its members. 
The authors of the volume are an unusual team. Allison Davis, a Negro, 
was graduated from Williams College in 1924 and has been for five years 
professor of anthropology and head of the division of social sciences at 
Dillard University in New Orleans. John Dollard is research associate 
in sociology at the Institute of Human Affairs, Yale University. 


We hope that the readers of the JoURNAL oF APPLIED PsYCHOLOGY will 
approve of the reprinting of the letter which appears below: 


ATTACK ON NERVES* 


Nazis’ ‘SrcreT WEAPON’ HELD TO BE MENTAL BREAKDOWN 
To the Editor of The New York Times: 

America is still relatively oblivious to Hitler’s ‘‘secret weapon.’’ The 
type of national defense necessary for the United States against this 
**weapon’’ is as yet unknown. 

Ten years of life behind the barrier of totalitarianism can give one 
adequate understanding of the menace facing us today. War by land, sea 
or air is not the real menace. That menace is intellectual and moral col- 
lapse. 

TBinee November, 1918, German scientific research has been at work 
perfecting what it has today produced—a mechanism capable of converting 
human beings into masses of purposeless flesh, incapable of resistance in 
the face of an advancing military force. 

The technique of this German process resolves itself into a systematic 
formula: selection of certain myths, superstitions, ‘‘ideals’’ and political 
rationalizations; the constant repetition of these ideas by radio, news- 
papers, cinematographs, etc., resulting in the disorientation of the opinions 
and emotional attitudes of the great mass of the civil population. 


ARMED Force NEXT 

Then renewed and progressively more vitriolic attacks and threats—in 
the newspapers, by radio, by the incessant sound of airplanes overhead, by 
bombastic public speeches before vast masses of emotionally excited people, 
and by incessant recounting of the horrors of the devastation wrought upon 
the civil population of neighboring countries already attacked by the 
advancing forces. 

Then comes physical collapse—Blitzkrieg. At the moment of military 
advance by land and air, properly timed in relation to the state of turmoil 
of the civil population, the onrushing military forces find the opposing 
military and civil population inert, resistless, quickly succumbing to 
exhaustion before the impact. 

The foregoing epitomizes the devastating, scientifically systemized 
method which the German scientists have designed to effect the moral 
disorganization of the civil population. This is Hitler’s long-heralded 
‘*secret weapon.’’ Is America prepared to challenge it effectively? 

This is a challenge to the men of American science. Our institutions 
are abundantly staffed with scientists in the fields of psychology, in all its 
multiform ramifications, who should now be summoned immediately to 


* Reprinted from The New York Times, Sunday, July 14, 1940. 
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analyze the principles underlying the technique of this new German menace, 
and to give us quickly some suggestions how to save this country from the 
fate meted out to Poland, Belgium and France. 

Our most pressing need today is a Congressional appropriation for the 
creation and staffing of adequate laboratories with scientists qualified to 
analyze and devise methods of combating this German technique. 


APPEAL TO SCIENCE 

The American war machine is being designed to combat physical force 
on land and sea and in the air. But the most potent enemy approaching 
our shores is not coming as a physical force; the enemy now confronting 
us is a mechanism of nerve-war. 

The purpose of this letter—from one who has come from ten years’ 
experience behind the totalitarian lines—is to appeal to the leaders of 
American science to create among themselves a functioning unit, consti- 
tuted at first of only five or ten specialists in human psychology, publicity 
and allied problems, under the aegis of one or more of our leading Ameri- 
can universities or scientific institutions or societies. 

This must be a group with the time and resources at their disposal, 
willing and qualified to make a preliminary investigation of the foregoing 
suggestions, and to plan quickly for an adequately comprehensive further 
study of the strategy of this new war of words and nerves, leading to the 
formulation and recommendation of adequate lines of defense, national in 
scope, in this new war against human reason—all of which would in turn 
materially contribute to the formulation of the aims which are very soon 
destined to govern the formulation and development of our national 
defense program. 

Tuomas W. HUNTINGTON. 

Dublin, N. H., July 10, 1940. 

Address after Aug. 1, 1940: 142 Chestnut Street, Boston, Mass. 














BOOK REVIEWS 


ABRAMSON, JADWIGA. L’enfant et l’adolescent instables. Etudes clini- 
ques et psychologiques. Paris, Alean. 1940, pp. xix +390. 

The ‘‘unstable child’’ is a generic term which no one has acceptably 
defined with any very great precision. It includes children who are 
hyperactive, emotional, delinquent, vacillating, psychopathic, and so on 
through a list inclusive of almost every kind of behavior difficulty. How- 
ever, in spite of the wide variations in connotation, there is fairly uni- 
versal practical agreement as to what is meant by an unstable child. 
Dr. Abramson reviews at length the descriptions and definitions presented 
in the literature, but perhaps wisely refrains from formulating any specific 
characterization of these children. 

The analysis of the nature and development of instability is based upon 
horizontal and longitudinal studies of 1,117 children seen first during the 
years 1926 to 1928 at the Clinique Annexe de Neuro-Psychiatriec Infantile. 
These children were selected from a total of 2,212 patients seen during 
these years. The statistical treatment of the data is supported by refer- 
ence to eighty-seven case histories typical of various kinds of behavior 
included. 

The major analyses were made of intellectual, motor, and emotional 
development in these children in comparison with similar development 
in unselected children. In the intellectual area she finds the unstable bet- 
ter in memory and verbalization than in reasoning and logic; they are 
more concerned with the immediate concrete rather than with imagi- 
nation; and their interests are more characteristic of younger ages. More 
important than such specific findings is the evidence of inconsistency in 
the course of development of various intellectual traits. In motor and 
manual aptitudes the unstable are inferior. It is, however, in the affective 
area that the most characteristic differences are found. 

Dr. Abramson analyses affective development into four major stages. 
The first, during the first and second years, is characterized by impulsive- 
ness, indetermination and an external rather than self reference. During 
the second stage, from two to five years, there is self-formation and con- 
flict between this self and the external world. It is characterized by 
negative attitudes and behavior. Childhood, from five years to puberty, 
represents the third stage and it is during these ages that the child 
achieves socialization and conquest of the external world. The fourth 
stage begins with puberty and continues through adolescence. Here again 
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there is conflict with the surrounding world, especially in relation to sex, 
religion, social forms, and with older people. Observation of unstable 
children through each of these four stages and their various subdivisions 
indicates that the character of the first two steps persists in some cases 
even through adolescence. On the other hand, the appearance of char- 
acteristics of the last two stages is retarded, in effect the development 
does not go beyond the normal for much younger ages. In the affective, 
as in the intellectual areas, the development is marked by inconsistency 
and disharmony. 

The etiology of instability is complex, and may vary from case to case. 
Heredity, childhood illness, school, vocation, family, race and sex are all 
discussed in the light of their etiologic significance. No one of these, but 
the interplay among all of them, is of major importance in the origin of 
instability. 

C. M. Lourtir, 
Indiana University 


Levine, A. J. Current Psychologies: A Critical Synthesis. Sci-Art 
Publishers, pp. 262. 

After eight chapters in which are pointed out the tenets of some sys- 
tems of psychology, the author comes out with an expl-nation of why 
there are several views, or at least why everyone is not a behaviorist. 

The list of schools dealt with here is a rather unconventional one to say 
the least. It starts with physiological psychology and branches off into 
behaviorism and functionalism within the first chapter. 

Two chapters are interposed between this and Gestalt; one, issues in 
personality studies, the other, relationships in mental disorders. The 
growth of Gestalt through Gestaltqualitit is presented and followed by 
*“laws’’ of Gestalt organization and topological psychology. 

The final systems treated are the purposivists and the psychoanalysts. 

It seems that in chapter nine the author ‘‘hits his stride’’ and begins 
to write his own opinions rather than explaining things in which he does 
not believe, which he did in chapters one to eight. 

However, in this place, attempting to give reasons for disagreement, 
a short space is devoted to ‘‘the confusion created by the use of meta- 
phorical language.’’ The present reviewer should like to point out that 
the contents of this book are not couched in the clearest language. If 
this is to inaugurate a series of text-books, as the publisher’s note states, 
the author might well have made some concession to students and pre- 
sented the material in a more lucid form. 

Although the critical synthesis is satisfactory, it cannot be expected 
that this book will find favor in the hands of the average undergraduate. 
From the reviewer’s point of view both Heidbreder’s Seven Psychologies 
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and Keller’s The Definition of Psychology are to be recommended in 
preference to this. 
K. W. OBERLIN, 
University of Delaware 


Buxton, CLAuDE E. Latent Learning and the Goal Gradient Hypothesis. 
Durham, N. C. Duke University Press. 1940. pp. ix+75. 

The use of mazes for studying the learning of rats is an old story. The 
common procedure is to start the animal at one end of the maze and to 
feed it when it reaches the other end. It was long thought that the 
animal’s learning consisted merely in the formation of connections in his 
nervous system such that locomotion through one part of the maze sim- 
ply set off the responses required for locomotion through the succeeding 
part. 

Dr. Buxton’s investigation, however, which used a development of the 
latent learning method throws doubt upon this simple explanation. He 
had his rats live in a complex maze, without any reward, for varying num- 
bers of nights so that these animals acquired some familiarity with its 
total structure; but without having a special point that was for them 
the beginning and another point that was for them the end. If, now, the 
learning of the mazes consists simply in the establishment of connections 
from the end of the maze back to its beginning, these rats should have little 
or no advantage over rats that had never been in the maze when they are 
put in the situation with a food reward at the end. Since, however, they 
showed a very material advantage over control animals, Dr. Buxton is 
foreed to conclude that maze learning is not so much a matter of connec- 
tions as of the reorganization by the animal of its psychological field or 
its behavioral environment. He proceeds to develop a field-theoretical 
account of learning in this situation which appears to have important im- 
plications for learning theory in general. 


GILLILAND, A. R., AnD CLARK, E. L. Psychology of Individual Differences. 
New York: Prentice-Hall, Inc. 1939. Pp. xvi+ 535. 

The treatment of individual differences in this book follows the broad 
outlines set by Binet and Henri’s article, Lc Psychologie Individuelle, and 
Stern’s Uber Psychologie der individuellen Differenzen. The first chapter 
is thus devoted to the historical antecedents of the subject. Individual 
differences have been acknowledged from earliest times, but the first sys- 
tematic formulation of such differences is found in Plato’s Republic. 
Plato would have every person in his ideal state do what he was best 
fitted for by nature and nurture. The problem received further elaboration 
at the hands of Aristotle, but soon afterwards came into disrepute and 
remained so until it was resuscitated by the educators of the late 18th 
and the early 19th centuries, that is, by such men as Rousseau, Pestalozzi, 
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Herbart, and Froebel. The Middle Ages had little regard for the indi- 
vidual and consequently could not be bothered investigating individual 
differences. The descriptive writers simply ignored them, while the 
experimentalists, such as Helmholtz, regarded them as experimental 
errors and did not investigate them any further. Even the great Wundt, 
as recently as the late 19th century, still frowned on them. It was only 
as a result of the work of Galton, Pearson, Cattell and others, that indi- 
vidual differences finally took their proper place in psychological theory. 

In chapter II and III the causes of individual differences and the 
methods of measuring these differences are presented. The causes ob- 
viously reduce to two: hereditary and environmental. To what extent 
individual differences are due to the one, the other, or a combination of 
the two is difficult to determine, because it is impossible completely to 
separate the effects of heredity from those of environment. The authors 
point out that ‘‘whenever two variables are to be found in a problem, 
. .. three methods of attack are possible (1) each variable may be 
studied separately; (2) one variable may be held constant while the other 
is varied and measured; (3) both variables may vary together but the 
amount of variation of one must be known’’ (pp. 28-29). Unfortunately, 
none of these methods is altogether applicable to a study of the causes 
of individual differences. Nevertheless, they have been used, and what 
knowledge we have of the causes of individual differences is largely 
derived from them. Family case studies, studies of foster children, and 
identical and fraternal twins, exemplifying the first two methods, and 
experimentally controlled studies, exemplifying the third method, have 
all contributed to our understanding of the causes of individual differ- 
ences. The authors also provide an explanation of the mechanism of 
inheritance and maturaticn, and show how environment affects individual 
differences, concluding their third chapter with a brief introduction to the 
statistical techniques used in psychology. 

Chapters IV, V and VI are given over to a discussion of physical, sex, 
and race differences. The authors point out that most of the physical dif- 
ferences found between individuals arrange themselves in a normal distri- 
bution curve for the population as a whole. Experimental evidence also 
‘shows that these differences are mainly due to the multiple combinations 
possible between the male and female germ cells at conception. Environ- 
ment, however, and especially pre-natal environment, exerts an appreciable 
influence upon the shaping of the physical features of the individual. 
With respect to the supposed causative relationship between physique and 
intellect, the authors espouse the popular view of the day, namely, that 
although a small positive relationship is found between such physical and 
physiological factors as body weight and height, cephalic index, dental 
defects, hookworm, blood groups, endocrine glands, and intellectual 
capacity, the relationship is so small that it has no statistical significance. 

















BOOK REVIEWS 525 


Sex differences exist, but aside from differences due to physical nature, 
they are mostly due to the environmental influences to which we are ex- 
posed. Recent experimentation points to differences between the sexes 
in several respects. As far as general intelligence is concerned, no reliable 
difference has been found; but in the matter of specific capacities, it has 
been repeatedly shown that women excel in verbal ability and memory, 
while men are more adept at mechanical tasks and mathematics. 
Women are more emotional, less self-sufficient, more introvert, more 
submissive, more sociable, more co-operative, and more self-controlled 
than men. Slight differences are also apparent in sensory and motor 
capacities. While giving a fairly representative review of the studies 
made on sex differences, the authors draw the inexcusable conclusion that 
*“some of these differences are manifest at birth and are therefore surely 
innate’’ (p. 136). To be present at birth is no proof that a trait is 
innate. Prenatal environment plays a tremendous part in development. 

The study of race differences has had to meet with two important dif- 
ficulties. First, because of extensive intermarriage between different 
peoples it is almost impossible at the present day to find a criterion on the 
basis of which a proper distinction can be drawn between races. Sec- 
ondly, the studies of race differences have been conducted almost exclu- 
sively with subjects representative of the different races an nationalities 
living in America. These subjects, however, cannot be regarded as truly 
representative of the peoples from which they spring, and consequently the 
results obtained by testing them are largely spurious as far as race differ- 
ences are concerned. In view of these difficulties, the differences found 
are generally traced to cultural influences. It is only in this light that the 
differences between the Whites, Negroes, Indians, and Orientals studied in 
America take on meaning. 

The question as to whether family resemblances are due to heredity or 
environment is considered in chapter VII. Studies made with identical 
twins, fraternal twins, and siblings brought up apart, force the authors 
to conclude that ‘‘siblings ordinarily correlate about .50 on both physical 
and mental traits. Fraternal twins generally correlate .70 to .80 when 
reared together. Identical twins living together correlate .90 to .95’’ 
(p. 246). It is pointed out that the earlier investigators of this problem, 
men like Galton and Thorndike, emphasized heredity as the all-important 
factor of family resemblances; while the present day tendency seems to 
be in the opposite direction. The authors caution the reader against 
taking the modern theory too seriously, believing that it may be only a 
temporary fad. 

The discussion of differences in intelligence also follows the traditional 
line. The subject is introduced by an attempt to define intelligence. 
The authors favor a definition in terms of ‘‘the ability to make success- 
ful adjustments in life’’ (p. 250). The nature of intelligence, whether 
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intelligence is a unitary trait or consists of a few or many factors, is 
‘slightly touched upon. The growth and deterioration of intelligence is 
considered, and the discussion of this topic stops with an evaluation of 
‘the cultural and educational influences upon the growth of intelligence. 

A few pages of the book are devoted to a discussion of types of extreme 
deviations from the normal such as the feebleminded, the genius, the in- 
sane, the criminal, and those with sensory and motor defects. The out- 

‘come of this discussion is the suggestion that by whatever means possible 
the level of society should be raised if society is to make the best of what 
science has to offer. This would entail an attempt at decreasing the 
number of sub-normals, and increasing the number of super-normals. 
The methods suggested are sterilization, segregation, and birth control. 
The logical conclusion of such a method of improving the level of society 
would be the superman. 

It is asserted by the authors that there are perhaps more individual 
differences in personality than in any other department of human nature. 

; Beeause of this fact, a thorough knowledge about these differences is 
‘mperative for both educational and vocational guidance purposes. Un- 
fortunately, so far the tests that we have devised to measure personality 
quantitatively have proven of comparatively little value. Illimitable pos- 
sibilities in this direction are foreseen, however. For the present we must 
be patient and continue our research. More emphasis should be put, not 
upon the construction of additional tests, but upon the validation and 
standardization of the tests we already possess. We shall have to revert 
from the initial attempt to measure general intelligence and general per- 
sonality, to a measurement of specific tendencies. Modern aptitude test- 
ing, as exemplified in the application of individual differences in the fields 
of education, business and industry, is a step in the right direction. Little 
hope is attached to a physiological approach to the study of personality. 

The treatment of the problem of individual differences in this book is 

too casual. The organization and vigorous presentation of research 
found in Anastasi’s book are lacking. While the book was published 

last year, no references of any significance go beyond 1937. Too many 
hasty generalizations crop up, and as far as this reviewer can make out, 
no’ matter is presented that has not already been touched upon by Ellis, 

Freeman, and Anastasi, in their books on the subject. 

Peter HAMPTON, 
University of Manitoba 
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